icefall
icefall copied to clipboard
How to train a bilingual Chinese-English zipformer model for speech recognition
According to the link shown below, https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20
How to train a bilingual Chinese-English zipformer model from icefall/egs/librispeech/ASR/pruned_transducer_stateless7 _streaming?
Because the data preparation part is based on English speech corpus LibriSpeech,there is not a data preparation and training procedure about bilingual corpus in that directory.
-
Please follow https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR to prepare your data
-
Use https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming to replace pruned_transducer_stateless5
- Please follow https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR to prepare your data
- Use https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming to replace pruned_transducer_stateless5
hi, according to your second reply message, the data preparation part is based on English speech corpus LibriSpeech,there is not a data preparation and training procedure about bilingual corpus in that directory.
Please follow step 1 to prepare your data. Step 1 is for bilingual dataset.
Please follow step 1 to prepare your data. Step 1 is for bilingual dataset.
One more question, there are several lines containing librispeech_cuts_*.jsonl.gz in icefall/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/asr_datamodule.py, should I replace them with data prepared according to your step 1?
should I replace them with data prepared according to your step 1?
Yes, you are right.