rnnt-speech-recognition
rnnt-speech-recognition copied to clipboard
Training is not converging. eval_wer sticks at ~95%.
I finally was able to run a training on a single GPU (multi-GPU does not seem to work right now) but the word-error-rate is not dropping.
I did not change anything in the code and I am using the common voice dataset as suggested by the README.md
As you can see below, the train_loss drops but the eval_wer goes back up after a slight drop:
Any idea where this might come from?
Excuse me, but I have another question. When I train the model, I always run into "out of memory". Just like this:
RuntimeError: CUDA out of memory. Tried to allocate 8.05 GiB (GPU 0; 23.62 GiB total capacity; 18.02 GiB already allocated; 2.84 GiB free; 19.59 GiB reserved in total by PyTorch)
I use one GPU to train, the memory size is 23.6GiB. So how could you succeed running model only on one GPU? Many thanks!
@PeiyanFlying I am using a rather small batch size like 8 or 16 on a GeForce 1080 Ti (11 GB VRAM). In fact, multi-GPU seems to be broken at the moment. I am not able to use more GPUs than one at this point.
@PeiyanFlying I am using a rather small batch size like 8 or 16 on a GeForce 1080 Ti (11 GB VRAM). In fact, multi-GPU seems to be broken at the moment. I am not able to use more GPUs than one at this point.
Thank you very much. These days I am working on RNNT training on LibriSpeech with Pytorch. But with the same config setting of this repository, It's easy to run into the OOM problem. I try to check. Thanks!
@PeiyanFlying Did you have any success yet? And, could you link me to that Pytorch library you're using? I'd like to take a look in case https://github.com/noahchalifour/rnnt-speech-recognition won't work for me
Ok, I am working on it. Once the PyTorch library can run successfully, I give you the link.
@stefan-falk I have also noted that the model is not converging. I have been working on a solution for a while. It seems though if you use a small enough dataset (as a test) the model does successfully converge. I did read that in the original paper they are using massive batch sizes and im not sure if that is the reason why the model is not converging. Any insights?
@noahchalifour Correct me if I'm wrong... Nobody has managed to train the network from this repo to reach at least 30 WER on Libri/common_voice?