kenlm
kenlm copied to clipboard
STT AI python file failing because of kenlm
Why is this happening? python3 generate_lm.py --input_txt data.txt --output_dir . --top_k 2 --kenlm_bins
/mnt/c/Users/eliso/speech2text/STT/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie
Converting to lowercase and counting word occurrences ... | |# | 198 Elapsed Time: 0:00:00
Saving top 2 words ...
Calculating word statistics ... Your text file has 398 words in total It has 3 unique words Your top-2 words are 85.1759 percent of all words Your most common word "sentence" occurred 199 times The least common word in your top-k is "another" with 140 times The first word with 199 occurrences is "sentence" at place 0
Creating ARPA file ...
=== 1/5 Counting and sorting n-grams ===
Reading /mnt/c/Users/eliso/speech2text/STT/data/lm/lower.txt.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Traceback (most recent call last):
File "generate_lm.py", line 232, in
Windows is only supported by other Windows users. But are you trying the latest version of the code from this repository? There was a problem earlier that caused segfaults. Also try a memory setting like 2.5G in case there's some 32-bit weirdness.