LatticeWordSegmentation icon indicating copy to clipboard operation
LatticeWordSegmentation copied to clipboard

Software to apply unsupervised word segmentation on lattices or text sequences using a nested hierarchical Pitman Yor language model




Software to apply unsupervised word segmentation on lattices or text sequences using a nested hierarchical Pitman Yor language model




In case of questions, suggestions, problems etc. please send an email or check the disussion group.

Oliver Walter: [email protected]

Discussion group: email: [email protected] google groups:




Iterative Bayesian Word Segmentation for Unspuervised Vocabulary Discovery from Phoneme Lattices Jahn Heymann, Oliver Walter, Reinhold Haeb-Umbach, Bhiksha Raj In 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014)

Unsupervised Word Segmentation from Noisy Input Jahn Heymann, Oliver Walter, Reinhold Haeb-Umbach, Bhiksha Raj In Automatic Speech Recognition and Understanding Workshop (ASRU 2013)

Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara "Learning a Language Model from Continuous Speech" In proceedings for InterSpeech 2010


Manual Instalation


Import project into kdevelop (or other IDE) Set cmake build path to $GITROOT/build/ (next to src/ and test/ directories) Install openFST from Required boost packages: boost_system, boost_filesystem

Note: For more performace use release (-O3 -DNDEBUG) build!


Automatic instalation


run, this will also install boost and openfst in the tools directory




For demonstations see the scripts in the test/ folder