Yifan Peng comments

Results 29 comments of


                                            Yifan Peng

Multi-GPU performance improvement for ASR in ESPnet1 by DDP

Hi @lazykyama Thanks for the investigation. In ESPnet2, did you increase the `batch_bins` with more GPUs? The config is different from that in ESPnet1, as described below: https://espnet.github.io/espnet/espnet2_training_option.html#the-relation-between-mini-batch-size-and-number-of-gpus In ESPnet2,...

Multi-GPU performance improvement for ASR in ESPnet1 by DDP

Thanks @lazykyama for your new investigation. The batch size issue seems quite common. @sw005320 I have updated the three docs and made a PR here: https://github.com/espnet/espnet/pull/4436

Offline/Online (standalone) ESPnet2 Transducer

Thanks for the great PR! I didn't look into the algorithm itself, but I made a few comments about the `doc` and `init` just now. I think it is already...

Offline/Online (standalone) ESPnet2 Transducer

I have got two questions. 1. Does it support GPU inference? 2. Does it support automatic mixed precision training with `use_amp: true`? For LibriSpeech, I'm increasing the nonstreaming model size...

Offline/Online (standalone) ESPnet2 Transducer

> > For LibriSpeech, I'm increasing the nonstreaming model size to 120M and extending the number of epochs to 60. > > Does the model need to be so large...

Any plan to switch to a newer version of Whisper

FYI, if we upgrade to a newer version, this warning will be gone, as it has been fixed in the whisper package.

Any plan to switch to a newer version of Whisper

I see. I do not know if there is any other conflicts..

Any plan to switch to a newer version of Whisper

I didn't check it carefully. Why does `tiktoken` affect our code? Do we use it?

unclear librispeech data prepare scripts for owsm_v1/s2t1

Hi, thanks for the question! For LibriSpeech, I do not use the standard segmented version. Instead, I used the "original-mp3". I believe this is released along with the segmented version....

Add SLUE_PERB codebase

I have a high-level discussion about the desgin: Should we add more components into the `asr` task? I'm recently feeling that the ASR task is becoming more complicated, but some...