transformer-xl
transformer-xl copied to clipboard
TF base model memory requirements
Hi, I was trying to reproduce the WT_103 lm results with TF models. I was running pytorch ones with batchsize 60 4xV100 16G GPUs fine. However if I change to tensorflow. There is always out-of-memory error even if I changed the batch size to 40 with 4 GPUs
May I know what is the estimated GPU requirements for training both tf_wt103 base and large models.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4289760,410] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node concat_1 (defined at /home/qingyu.tan/projects/pos-embeddings/pos-transformer-xl/tf/gpu_utils.py:34) = ConcatV2[N=20, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/transformer/adaptive_softmax/cutoff_3/strided_slice_grad/StridedSliceGrad, gradients/transformer/adaptive_softmax/cutoff_2/strided_slice_grad/StridedSliceGrad, gradients/transformer/adaptive_softmax/cutoff_1/strided_slice_grad/StridedSliceGrad, gradients/transformer/adaptive_softmax/cutoff_0/strided_slice_grad/StridedSliceGrad, gradients/transformer/adaptive_embed/embedding_lookup_grad/Reshape, gradients_1/transformer_1/adaptive_softmax/cutoff_3/strided_slice_grad/StridedSliceGrad/_8853, gradients_1/transformer_1/adaptive_softmax/cutoff_2/strided_slice_grad/StridedSliceGrad/_8855, gradients_1/transformer_1/adaptive_softmax/cutoff_1/strided_slice_grad/StridedSliceGrad/_8857, gradients_1/transformer_1/adaptive_softmax/cutoff_0/strided_slice_grad/StridedSliceGrad/_8859, gradients_1/transformer_1/adaptive_embed/embedding_lookup_grad/Reshape/_8861, gradients_2/transformer_2/adaptive_softmax/cutoff_3/strided_slice_grad/StridedSliceGrad/_8863, gradients_2/transformer_2/adaptive_softmax/cutoff_2/strided_slice_grad/StridedSliceGrad/_8865, gradients_2/transformer_2/adaptive_softmax/cutoff_1/strided_slice_grad/StridedSliceGrad/_8867, gradients_2/transformer_2/adaptive_softmax/cutoff_0/strided_slice_grad/StridedSliceGrad/_8869, gradients_2/transformer_2/adaptive_embed/embedding_lookup_grad/Reshape/_8871, gradients_3/transformer_3/adaptive_softmax/cutoff_3/strided_slice_grad/StridedSliceGrad/_8873, gradients_3/transformer_3/adaptive_softmax/cutoff_2/strided_slice_grad/StridedSliceGrad/_8875, gradients_3/transformer_3/adaptive_softmax/cutoff_1/strided_slice_grad/StridedSliceGrad/_8877, gradients_3/transformer_3/adaptive_softmax/cutoff_0/strided_slice_grad/StridedSliceGrad/_8879, gradients_3/transformer_3/adaptive_embed/embedding_lookup_grad/Reshape/_8881, concat/axis)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node VerifyFinite/control_dependency/_8889}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_67852_VerifyFinite/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.