rnnt-speech-recognition Training is not converging. eval

I finally was able to run a training on a single GPU (multi-GPU does not seem to work right now) but the word-error-rate is not dropping.

I did not change anything in the code and I am using the common voice dataset as suggested by the README.md

As you can see below, the train_loss drops but the eval_wer goes back up after a slight drop:

Any idea where this might come from?

Jun 25 '20 07:06 stefan-falk

Excuse me, but I have another question. When I train the model, I always run into "out of memory". Just like this:

RuntimeError: CUDA out of memory. Tried to allocate 8.05 GiB (GPU 0; 23.62 GiB total capacity; 18.02 GiB already allocated; 2.84 GiB free; 19.59 GiB reserved in total by PyTorch)

I use one GPU to train, the memory size is 23.6GiB. So how could you succeed running model only on one GPU？ Many thanks!

Jul 01 '20 03:07 PeiyanFlying

@PeiyanFlying I am using a rather small batch size like 8 or 16 on a GeForce 1080 Ti (11 GB VRAM). In fact, multi-GPU seems to be broken at the moment. I am not able to use more GPUs than one at this point.

Jul 01 '20 08:07 stefan-falk

@PeiyanFlying I am using a rather small batch size like 8 or 16 on a GeForce 1080 Ti (11 GB VRAM). In fact, multi-GPU seems to be broken at the moment. I am not able to use more GPUs than one at this point.

Thank you very much. These days I am working on RNNT training on LibriSpeech with Pytorch. But with the same config setting of this repository, It's easy to run into the OOM problem. I try to check. Thanks!

Jul 01 '20 19:07 PeiyanFlying

@PeiyanFlying Did you have any success yet? And, could you link me to that Pytorch library you're using? I'd like to take a look in case https://github.com/noahchalifour/rnnt-speech-recognition won't work for me

Jul 02 '20 06:07 stefan-falk

Ok, I am working on it. Once the PyTorch library can run successfully, I give you the link.

Jul 03 '20 14:07 PeiyanFlying

@stefan-falk I have also noted that the model is not converging. I have been working on a solution for a while. It seems though if you use a small enough dataset (as a test) the model does successfully converge. I did read that in the original paper they are using massive batch sizes and im not sure if that is the reason why the model is not converging. Any insights?

Sep 04 '20 23:09 noahchalifour

@noahchalifour Correct me if I'm wrong... Nobody has managed to train the network from this repo to reach at least 30 WER on Libri/common_voice?

Sep 22 '20 18:09 WrathOfGrapes

rnnt-speech-recognition
rnnt-speech-recognition copied to clipboard

Training is not converging. eval_wer sticks at ~95%.

rnnt-speech-recognition rnnt-speech-recognition copied to clipboard

Training is not converging. eval_wer sticks at ~95%.

rnnt-speech-recognition
rnnt-speech-recognition copied to clipboard