deepspeech.torch icon indicating copy to clipboard operation
deepspeech.torch copied to clipboard

High values of WER on Libri Dataset (test-clean and dev-clean)

Open ismorphism opened this issue 7 years ago • 5 comments

Hi everyone! I have the following problem: when I firstly try to train pure DeepSpeech net with default parameters (7 layers, 1760 neurons, batch size is 20) I always get cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-3543/cutorch/lib/THC/generic/THCStorage.cu:66. (My dataset is default Libri dataset of 600 Mb which was described in Data preparation and running wiki chapter of this repository. My GPU is Nvidia 1080 GTX.) I thought it's ok and maybe I have to decrease batch size and number of layers/neurons. But only satisfied architecture on this moment is 6 layers , 1200 neurons and batch size is 12. Other architectures caused out of memory error every time on 5 or 6 epoch or something. At the same time my best WER value is about 76.5 and doesn't seem to get low. I tried to change batch size a little but my maximum working value is 12 now. Also, I tried to use LSTM architecture with 600 neurons and 6 layers and it showed worse results. I tried to change learning rates, rate annealing, maxnorm and momentum, add some permute Batch but always I get something about 76.5. Does anyoune know what else could I do? Maybe the answer is to use bigger batch size and deeper architecture but then I have to use more computational power and it doesn't seem good for me..

ismorphism avatar Mar 02 '17 07:03 ismorphism

Does the 1080 have 6GB? I'm not sure if that will be able to fit the full model.

If you look back at my responses to the thread on running out of memory, I found some tweaks to the code that drastically reduced the memory. (But, alas, I haven't had time to commit them...)

In the end, if you don't have much memory, you can't run large batches. (on a 6gb testing card, I don't think I could run more that 3 or 4 in a batch).

As the batch size changes, the ideal learning rate usually does as well (in my experience). You may have to play with that (this is the real hard work of deep learning)

mtanana avatar Mar 02 '17 14:03 mtanana

Hi Boris, Well you are right, you need to have bigger batch size and more gpus. I think the reason you get out of memory error is because in permute batch after 5 or 6 epochs, you suddenly have a batch, where speech files have many timesteps and you need to store intermediate layers values for each timestep for backpropagation, the only way to make it work is to have less batch size. Other way around is to store gradients locally on your machine (i.e if you want 30 samples per batch and you can only run 10 samples per batch), you will store gradients 3 times for each batch of 10 samples and update the gradient once after 3 batchs.This will require you to make changes in the code.

suhaspillai avatar Mar 02 '17 14:03 suhaspillai

#71 There's a comment from me near the bottom that helps with memory..

mtanana avatar Mar 02 '17 14:03 mtanana

Sorry for the late response, the GTX 1080 is a great card but as said above, only has 8gb of VRAM. Reduce the minibatch size if you want to train on this GPU!

Which dataset are you training on specifically?

SeanNaren avatar Mar 13 '17 10:03 SeanNaren

Don't forget to downsize the minibatch for testing too

mtanana avatar Mar 24 '17 15:03 mtanana