tensor2tensor
tensor2tensor copied to clipboard
Out of Memory while training
I am getting an OoM error while training with 8 GPUs but not with 1 GPU.
I use the following command to train.
t2t-trainer
--data_dir=$DATA_DIR
--problem=$PROBLEM
--model=$MODEL
--hparams='max_length=100,batch_size=1024,eval_drop_long_sequences=true'
--worker_gpu=8
--train_steps=350000
--hparams_set=$HPARAMS
--eval_steps=5000
--output_dir=$TRAIN_DIR
--schedule=continuous_train_and_eval
Any suggestions? I also tried to reduce the batch_size as well as the max_length but no luck.
same question.
It seems reducing batch_size does not make a differnce.