KenLM-training icon indicating copy to clipboard operation
KenLM-training copied to clipboard

How to generate trie file?

Open EuphoriaCelestial opened this issue 4 years ago • 4 comments

Hi, I have successful run all those steps in README and have bible.arpa bible.binary but there is no trie file How can I generate trie? I cant find any tutorial about this

EuphoriaCelestial avatar Oct 23 '20 04:10 EuphoriaCelestial

Hey @EuphoriaCelestial, trie is a data structure that's used when binarizing the model. Please have a look here for more info: kenlm/data-structures.

So, just using the trie switch should solve the issue.

kmario23 avatar Oct 23 '20 08:10 kmario23

Hey @EuphoriaCelestial, trie is a data structure that's used when binarizing the model. Please have a look here for more info: kenlm/data-structures.

So, just using the trie switch should solve the issue.

I have tried this command kenlm/bin/build_binary -T /tmp/trie -S 1G trie bible.arpa bible.binary but get this error everytime

Reading bible.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Segmentation fault (core dumped)

EuphoriaCelestial avatar Oct 23 '20 09:10 EuphoriaCelestial

This seems to be a recurring issue. C.f. kenlm/issues/248, /letter-based-language-model/33986

Some suggestions:

  • there's a discourse forum for DeepSpeech related issues to get help from.
  • recheck the (correct installation of all) dependencies. Or reinstall kenlm. Boost libs might cause issues.
  • Segmentation fault (core dumped) is a C/C++ issue. Seems to me that there's something wrong with the .arpa file.

kmario23 avatar Oct 23 '20 11:10 kmario23

This seems to be a recurring issue. C.f. kenlm/issues/248, /letter-based-language-model/33986

Some suggestions:

* there's a [discourse forum for DeepSpeech related issues](https://discourse.mozilla.org/c/mozilla-voice-stt/247) to get help from.

* recheck the (correct installation of all) dependencies. Or reinstall kenlm. Boost libs might cause issues.

* Segmentation fault (core dumped) is a C/C++ issue. Seems to me that there's something wrong with the `.arpa` file.

I have tried clean install on another machine with better specs (i7, 32gb RAM, 2080ti) but still got the same error the .arpa file seem good ... I think so because I can use it to score sentences normally, it give the correct score with the example in README

EuphoriaCelestial avatar Oct 24 '20 02:10 EuphoriaCelestial