icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Mismatch between `L.pt` and `words.txt` in GigaSpeech `prepare.sh`

Open yfyeung opened this issue 2 years ago • 0 comments

In the prepare.sh of GigaSpeech: https://github.com/k2-fsa/icefall/blob/master/egs/gigaspeech/ASR/prepare.sh#L188 L.pt is relative to words, and words are generated by lexicon.txt: https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/prepare_lang.py#L354 Then words.txt is generated by the transcript words in gigaspeech_supervisions_XL.jsonl.gz: https://github.com/k2-fsa/icefall/blob/master/egs/gigaspeech/ASR/prepare.sh#L191-L238 This leads to the mismatch between L.pt and words.txt.

yfyeung avatar Apr 06 '23 13:04 yfyeung