Yifan Peng
Yifan Peng
Hi @lazykyama Thanks for the investigation. In ESPnet2, did you increase the `batch_bins` with more GPUs? The config is different from that in ESPnet1, as described below: https://espnet.github.io/espnet/espnet2_training_option.html#the-relation-between-mini-batch-size-and-number-of-gpus In ESPnet2,...
Thanks @lazykyama for your new investigation. The batch size issue seems quite common. @sw005320 I have updated the three docs and made a PR here: https://github.com/espnet/espnet/pull/4436
Thanks for the great PR! I didn't look into the algorithm itself, but I made a few comments about the `doc` and `init` just now. I think it is already...
I have got two questions. 1. Does it support GPU inference? 2. Does it support automatic mixed precision training with `use_amp: true`? For LibriSpeech, I'm increasing the nonstreaming model size...
> > For LibriSpeech, I'm increasing the nonstreaming model size to 120M and extending the number of epochs to 60. > > Does the model need to be so large...
FYI, if we upgrade to a newer version, this warning will be gone, as it has been fixed in the whisper package.
I see. I do not know if there is any other conflicts..
I didn't check it carefully. Why does `tiktoken` affect our code? Do we use it?
Hi, thanks for the question! For LibriSpeech, I do not use the standard segmented version. Instead, I used the "original-mp3". I believe this is released along with the segmented version....
I have a high-level discussion about the desgin: Should we add more components into the `asr` task? I'm recently feeling that the ASR task is becoming more complicated, but some...