Oldenburg Logatome Corpus (OLLO)

Version history:

Version 2.0:

  • 10 French-speaking people have been recorded in Mons, Belgium by Multitel. Speech data from those speakers has been added to the download, which makes OLLO2 a multi-lingual database.
  • The speakers recorded essentially the same 150 logatomes that are contained in the German part of the database. In order to have similar pronunciations, the transcription of the logatomes was adjusted accordingly. For the German phonemes [I] and [U] (as defined in the SAMPA alphabet), which don't exist in French, the adjustment has been chosen in such a way that French logatomes closely resemble the German ones.
  • Additionally, 50 phonetically balanced sentences have been recorded in all six variabilities (slow and fast, soft and loud speaking style, as well as question and statement).
  • The German speech packages (OLLO1.4_[NO, EF, BV, EP].ZIP) remain unchanged and don't have to be downloaded again if you already have OLLO1.4.

Version 1.4:

  • One of the speakers (S03) has been re-recorded, since the pronunciation of some VCVs was inconsistent with VCV utterances from other speakers. For example, S03 had pronounced logatome 27 as [E S E], whereas the normal pronunciation should have been [E S @]. With our baseline MFCC-HTK system (closed test), this update improved the average recocognition score for speaker S03 from 78.9 to 87.6 %.
  • Fixed phonetic labels: Some of the label files of logatome L071 contained the transcription [d a t] instead of [d a: t].
  • The files from dialect regions East Frisia, Bavaria and East Phalia remain unchanged. If you already downloaded OLLO1.3, you can use the archives OLLO1.3_EF.ZIP, OLLO1.3_BV.ZIP and OLLO1.3_EP.ZIP instead of OLLO1.4_*.ZIP.

Version 1.3:

  • The OLLO corpus was phonetically time-labeled, i.e., temporal positions of phoneme boundaries have been determined for each utterance, making it suitable for tasks such as training of phoneme recognizers. The labels may be downloaded from here [30.5 MB].
  • During the manual processing, the clicks of 800 files were overwritten with zeros instead of cutting the clicks and applying cross-fading. This resulted in short passages with zero amplitude instead of microphone noise, which might be problematic for certain types of feature extraction. These files were again manually edited and cross-fading was applied where clicks had to be removed. Furthermore, 44 files of this subset were deleted.
  • In version 1.1 and 1.2, the gender of speakers 10 and 11 were not labeled correctly (speaker 10 is male, speaker 11 is female), which was fixed in V1.3.
  • If you have limited download bandwidth, you may use the update file from version 1.2 to 1.3 (instead of downloading the files linked below). Before replacing the speech files, the rename script in the archive has to be executed! Have a look at update_readme.txt in the zip file for details. The update contains batch- and shellscript files for an automatic correction of labels and deletion of the 44 utterances (for windows and Linux systems). The corrected speech files without the zero amplitude passages are also contained.
    Download OLLO Update V1.2 -> V1.3 [18.8 MB]

Version 1.2:

  • In order to obtain comparable results for different feature types / ASR systems, lists for training and testing have been generated (both for speaker dependent and speaker independent recognition). See the readme in OLLO1.2_TRAINING_TEST_LISTS.ZIP for details.
  • A total of 495 defective files were deleted from the corpus. These files contained either noise only or incomplete or incorrect utterances. A list of these files can be downloaded here (in case you already have V1.1 and don't want to download the speech archives again).

Version 1.1:

  • Files from speaker 15 (which contained artifacts like mouse clicks) were replaced with new recordings and audiodata for speaker 20 was added.
  • A zip package with calibration files is included this release.

Copyright notice: Permission to use this database for purely research or educational
purposes is granted. No commercial exploitation of this database is permitted unless permission has been obtained separately from UNIVERSITAET OLDENBURG (contact adress: medi-ollo_AT_listserv.uni-oldenburg.de). Copyright 2005, Medizinische Physik, Universitaet Oldenburg, Germany. All rights reserved.


