icefall icon indicating copy to clipboard operation
icefall copied to clipboard

WIP: new recipe: lightweight CTC model for librispeech

Open wangtiance opened this issue 3 years ago • 4 comments
trafficstars

This PR includes:

  1. A new recipe with lightweight CTC model. The encoder is similar to MobileNet V2 with a param count of just 1.76M using phone Lexicon and default model parameters, yet the WER is not far behind the TDNN-LSTM-CTC recipe. With BPE 500 lexicon the param count is 2.09M. I work on ASR for ultra-low powered edge devices, so it's interesting to see what 1-2M parameters can do. As a next step, I'll explore other lightweight models that work well with ImageNet/CIFAR-100 (such as MobileViT) and try to adapt them to ASR task.

  2. changes to checkpoint.py which enables it to average state dicts containing integer data types, such as "num_batches_tracked" in nn.BatchNorm2D. Currently it throws an error. My approach is to cast it to float, do the averaging and cast it back. I'm not sure if this is a robust way.

wangtiance avatar Aug 12 '22 06:08 wangtiance

@csukuangfj Thanks for the comments. I've made changes accordingly. Pretrained models are here: https://huggingface.co/wangtiance/lightweight_ctc

But I had trouble uploading training logs to tensorboard.dev, even with VPN on. Is there an alternative like gitee?

wangtiance avatar Aug 16 '22 08:08 wangtiance

But I had trouble uploading training logs to tensorboard.dev, even with VPN on. Is there an alternative like gitee?

Could you upload training logs, data/lang_phone, and decoding results to hugging face?

You can use https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main as an example. Screen Shot 2022-08-16 at 4 22 19 PM

I can help to upload the tensorboard logs to tensorboard.dev once you upload them to hugging face.

csukuangfj avatar Aug 16 '22 08:08 csukuangfj

But I had trouble uploading training logs to tensorboard.dev, even with VPN on. Is there an alternative like gitee?

Could you upload training logs, data/lang_phone, and decoding results to hugging face?

You can use https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13/tree/main as an example. Screen Shot 2022-08-16 at 4 22 19 PM

I can help to upload the tensorboard logs to tensorboard.dev once you upload them to hugging face.

Done: https://huggingface.co/wangtiance/lightweight_ctc

wangtiance avatar Aug 17 '22 01:08 wangtiance

Done: https://huggingface.co/wangtiance/lightweight_ctc

Thanks! I just downloaded the tensorboard logs from your repo and uploaded them to the following address:

https://tensorboard.dev/experiment/1JTkrxBrRMie3YDY3k8bYg/#scalars&_smoothingWeight=0

csukuangfj avatar Aug 19 '22 07:08 csukuangfj

Do you plan to re-open it? @wangtiance

csukuangfj avatar Jan 16 '23 10:01 csukuangfj

No, I'm opening a new pull request with new model architecture and better results.

wangtiance avatar Jan 16 '23 10:01 wangtiance

No, I'm opening a new pull request with new model architecture and better results.

Thanks!

csukuangfj avatar Jan 17 '23 01:01 csukuangfj