rnnt-speech-recognition
rnnt-speech-recognition copied to clipboard
RAM OOM Problem
when i run your code it happened RAM OOM in eval part i don't know why happened this problem? my desktop ram size is 128GB and using 4-ways gpu and it was increase memory every eval batch also 4-ways gpu batch process speed is slower than single gpu
@kjh21212 I'm facing the same RAM issue, were you able to solve it?
I have same issue. My system is
RAM : 128GB GPU : GTX 1080ti * 4 OS : ubuntu 18.04 NVIDIA Driver : 440.82 CUDA : 10.1 CUDNN : 7.6.5 python : 3.6.9 tensorflow & tensorflow-gpu : 2.1.0 (And I do not change any param in run_common_voice.py)
When I run the run_common_voice.py code. These are shown.
At the 0th epoch Eval_step is running with retracing warning and then, I got the OOM error.
Disable evaluation at the 0th epoch. 2-1. When there is retracing warning (slow) Epoch: 0, Batch: 60, Global Step: 60, Step Time: 26.0310, Loss: 165.6244 2-2. When there is no retracing warning (fast) Epoch: 0, Batch: 62, Global Step: 62, Step Time: 6.3741, Loss: 164.6387
Then I get the OOM error after this line Epoch: 0, Batch: 226, Global Step: 226, Step Time: 5.9092, Loss: 142.7257 ...
I think some of the tf.function? affect to speed of the training.
Does the retracing warning have a connection with OOM error? --> If so, how can I solve the retracing warning? --> If not, how can I solve the OOM error?
@nambee Seems like there's something with GradientTape, RNN layers or TFRecords. I implemented DeepSpeech2 with tfrecord dataset in keras and when I trained it using .fit function, no OOM error, but when I trained using GradientTape, the memory kept going up and then OOM. However, when I trained SEGAN (No recurrent network, only Conv) with a generator dataset using GradientTape, it worked fine.
Please try again with the latest commit. I have updated it to use Tensorflow 2.2.0 and solved the retracing issue
@noahchalifour Just executed the current repository code with one GPU. I am also running into the OOM error also using a GeForce GTX 1080 Ti card.
I have figured out that if we use tf.data.TFRecordDataset
, then wraping whole dataset loop with @tf.function
can avoid RAM OOM (and also train faster), like:
@tf.function
def train():
for batch in train_dataset:
train_step(batch)
The downside of this trick is we can't use native python functions and unimplemented tf functions in graph mode (like tf.train.Checkpoint.save()
). However, we can use tf.py_function
or tf.numpy_function
to run them, but we have to run tf.distribute.Server
if we want to train using multi-gpus, this limitation is mentioned here: https://www.tensorflow.org/api_docs/python/tf/numpy_function?hl=en
@usimarit Are you able to train/use the model? I can only afford a very small batch size (4-8 samples) when running on a single GeForce 1080 Ti (~11 GB RAM) and I am not even sure if it's working.
How long did you have train your model?
@usimarit Are you able to train/use the model? I can only afford a very small batch size (4-8 samples) when running on a single GeForce 1080 Ti (~11 GB RAM) and I am not even sure if it's working.
How long did you have train your model?
I guess small batch size is normal for ASR models. I trained a ctc model on rtx 2080ti 11G on about 300hours dataset and it took 3 days for 12 epochs with batch size 4. But this issue is about RAM OOM, not GPU VRAM OOM :)) I've tested multiple times using tfrecorddataset and it seems like there is some bugs when iterating it using for loop.
@usimarit Oh, I misinterpreted the issue the.
Yeah, that batch size size what I am using too. Didn't expect such a small batch size to work out :)
Please try again with the latest commit. I have updated it to use Tensorflow 2.2.0 and solved the retracing issue
@noahchalifour But I'am also facing the problem even with using Tensorflow2.2.0 and the latest commit.
I have figured out that if we use
tf.data.TFRecordDataset
, then wraping whole dataset loop with@tf.function
can avoid RAM OOM (and also train faster), like:@tf.function def train(): for batch in train_dataset: train_step(batch)
The downside of this trick is we can't use native python functions and unimplemented tf functions in graph mode (like
tf.train.Checkpoint.save()
). However, we can usetf.py_function
ortf.numpy_function
to run them, but we have to runtf.distribute.Server
if we want to train using multi-gpus, this limitation is mentioned here: https://www.tensorflow.org/api_docs/python/tf/numpy_function?hl=en
@usimarit I have tried it, but it still doesn't work