icefall
icefall copied to clipboard
Mismatch between `L.pt` and `words.txt` in GigaSpeech `prepare.sh`
In the prepare.sh of GigaSpeech: https://github.com/k2-fsa/icefall/blob/master/egs/gigaspeech/ASR/prepare.sh#L188
L.pt is relative to words, and words are generated by lexicon.txt: https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/prepare_lang.py#L354
Then words.txt is generated by the transcript words in gigaspeech_supervisions_XL.jsonl.gz: https://github.com/k2-fsa/icefall/blob/master/egs/gigaspeech/ASR/prepare.sh#L191-L238
This leads to the mismatch between L.pt and words.txt.