icefall
icefall copied to clipboard
WIP: new recipe: lightweight CTC model for librispeech
This PR includes:
-
A new recipe with lightweight CTC model. The encoder is similar to MobileNet V2 with a param count of just 1.76M using phone Lexicon and default model parameters, yet the WER is not far behind the TDNN-LSTM-CTC recipe. With BPE 500 lexicon the param count is 2.09M. I work on ASR for ultra-low powered edge devices, so it's interesting to see what 1-2M parameters can do. As a next step, I'll explore other lightweight models that work well with ImageNet/CIFAR-100 (such as MobileViT) and try to adapt them to ASR task.
-
changes to checkpoint.py which enables it to average state dicts containing integer data types, such as "num_batches_tracked" in nn.BatchNorm2D. Currently it throws an error. My approach is to cast it to float, do the averaging and cast it back. I'm not sure if this is a robust way.
@csukuangfj Thanks for the comments. I've made changes accordingly. Pretrained models are here: https://huggingface.co/wangtiance/lightweight_ctc
But I had trouble uploading training logs to tensorboard.dev, even with VPN on. Is there an alternative like gitee?
But I had trouble uploading training logs to tensorboard.dev, even with VPN on. Is there an alternative like gitee?
Could you upload training logs, data/lang_phone, and decoding results to hugging face?
You can use https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main
as an example.

I can help to upload the tensorboard logs to tensorboard.dev once you upload them to hugging face.
But I had trouble uploading training logs to tensorboard.dev, even with VPN on. Is there an alternative like gitee?
Could you upload training logs,
data/lang_phone, and decoding results to hugging face?You can use https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main as an example.
I can help to upload the tensorboard logs to
tensorboard.devonce you upload them to hugging face.
Done: https://huggingface.co/wangtiance/lightweight_ctc
Done: https://huggingface.co/wangtiance/lightweight_ctc
Thanks! I just downloaded the tensorboard logs from your repo and uploaded them to the following address:
https://tensorboard.dev/experiment/1JTkrxBrRMie3YDY3k8bYg/#scalars&_smoothingWeight=0
Do you plan to re-open it? @wangtiance
No, I'm opening a new pull request with new model architecture and better results.
No, I'm opening a new pull request with new model architecture and better results.
Thanks!