gpt-2 icon indicating copy to clipboard operation
gpt-2 copied to clipboard

OOM on 345M with GPU

Open babaraza opened this issue 4 years ago • 4 comments

Hi, I am getting an error when trying to train using 345M with GPU. If I use CPU it trains fine, albeit very slowly. I am using Nvidia GTX 1070 and have CUDA and CUDNN installed.

The interactive_conditional_samples.py and generate_unconditional_samples.py work fine with GPU so I know GPU is working. I only encounter the OOM when trying to train.

I tried using the "--optimizer sg" flag and using default batch_size of 1: python train.py --dataset data.npz --model_name 345M --optimizer sgd

Error (truncated): 2021-01-31 21:52:28.744480: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.00MiB (rounded to 4194304). Current allocation summary follows. 2021-01-31 21:52:28.744901: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 276, Chunks in use: 266. 69.0KiB allocated for chunks. 66.5KiB in use in bin. 1.0KiB client-requested in use in bin. 2021-01-31 21:52:28.745756: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2021-01-31 21:52:28.746117: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): Total Chunks: 3, Chunks in use: 1. 4.0KiB allocated for chunks. 1.3KiB in use in bin. 1.0KiB client-requested in use in bin.

...

2021-01-31 21:52:28.966844: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 6.64GiB 2021-01-31 21:52:28.966890: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 71357135 36 memory_limit_: 7135713690 available bytes: 154 curr_region_allocation_bytes_: 8589934592 2021-01-31 21:52:28.966950: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: Limit: 7135713690 InUse: 7134272000 MaxInUse: 7134274048 NumAllocs: 3078 MaxAllocSize: 268435456

2021-01-31 21:52:28.967091: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ***************************************


2021-01-31 21:52:28.967156: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[1,16,1024,1024] and type bool on /job:localhost/replica:0/task:0 /device:GPU:0 by allocator GPU_0_bfc

Any idea on how to resolve this?

babaraza avatar Feb 01 '21 04:02 babaraza

On a single 1070? I don't think that's possible. I'm currently training a 10GB dataset using the 345M model on a 3090 and it's using ~17GB on VRAM

345MTraining

jaimu97 avatar Feb 23 '21 08:02 jaimu97

Thank you for your reply, I am trying to get my hands on a 3080 or 3090 for this very reason and your screenshot and message just confirmed I need the 3090!

babaraza avatar Feb 23 '21 15:02 babaraza

I honestly couldn't recommend getting a 3090 just for training/fine-tuning 345M gpt-2. 117M Is definitely good enough for every use case (for me anyway) if your 1070 can handle that. I only use it to train when I'm at work. ;)

jaimu97 avatar Feb 23 '21 20:02 jaimu97

I agree it won’t solely be for training I just wanted to justify it for my gaming needs ;) with that said it’s almost impossible to find 3080/3090 for good prices so I’m waiting it out. I do appreciate everyone’s input. I did train the 117M and it does seem to give good results. There’s also the google collabs that everyone’s been using to train since they give you access to a nvidia t4 for free.

babaraza avatar Feb 24 '21 04:02 babaraza