gpt-2
gpt-2 copied to clipboard
774M Model running out of memory
2019-08-20 22:56:50.301264: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8e6a00 next 222 of size 256 [71/1832]
2019-08-20 22:56:50.301278: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8e6b00 next 224 of size 5120
2019-08-20 22:56:50.301307: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8e7f00 next 225 of size 256
2019-08-20 22:56:50.301330: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8e8000 next 227 of size 20480
2019-08-20 22:56:50.301339: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8ed000 next 228 of size 5120
2019-08-20 22:56:50.301347: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8ee400 next 229 of size 5120
2019-08-20 22:56:50.301355: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8ef800 next 230 of size 5120
2019-08-20 22:56:50.301381: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f0c00 next 231 of size 5120
2019-08-20 22:56:50.301387: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f2000 next 232 of size 5120
2019-08-20 22:56:50.301399: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f3400 next 233 of size 5120
2019-08-20 22:56:50.301408: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f4800 next 234 of size 256
2019-08-20 22:56:50.301416: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f4900 next 236 of size 5120
2019-08-20 22:56:50.301425: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f5d00 next 237 of size 256
2019-08-20 22:56:50.301433: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f5e00 next 238 of size 15360
2019-08-20 22:56:50.301442: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f9a00 next 239 of size 256
2019-08-20 22:56:50.301450: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8f9b00 next 242 of size 5120
2019-08-20 22:56:50.301459: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8faf00 next 243 of size 256
2019-08-20 22:56:50.312661: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7f4cfe8fb000 next 18446744073709551615 of size 20480
2019-08-20 22:56:50.312681: I tensorflow/core/common_runtime/bfc_allocator.cc:809] Summary of in-use Chunks by size:
2019-08-20 22:56:50.312699: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 411 Chunks of size 256 totalling 102.8KiB
2019-08-20 22:56:50.312710: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 1280 totalling 1.2KiB
2019-08-20 22:56:50.312720: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 16 Chunks of size 4096 totalling 64.0KiB
2019-08-20 22:56:50.312732: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 940 Chunks of size 5120 totalling 4.59MiB
2019-08-20 22:56:50.312741: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 144 Chunks of size 15360 totalling 2.11MiB
2019-08-20 22:56:50.312750: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 144 Chunks of size 20480 totalling 2.81MiB
2019-08-20 22:56:50.312760: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 8 Chunks of size 81920 totalling 640.0KiB
2019-08-20 22:56:50.312770: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 8 Chunks of size 4194304 totalling 32.00MiB
2019-08-20 22:56:50.312779: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 79 Chunks of size 5242880 totalling 395.00MiB
2019-08-20 22:56:50.312789: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 5246976 totalling 5.00MiB
2019-08-20 22:56:50.312798: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 143 Chunks of size 6553600 totalling 893.75MiB
2019-08-20 22:56:50.312808: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 9134592 totalling 8.71MiB
2019-08-20 22:56:50.312821: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 141 Chunks of size 19660800 totalling 2.58GiB
2019-08-20 22:56:50.312831: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 28 Chunks of size 20971520 totalling 560.00MiB
2019-08-20 22:56:50.312841: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 22806528 totalling 21.75MiB
2019-08-20 22:56:50.312850: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 283 Chunks of size 26214400 totalling 6.91GiB
2019-08-20 22:56:50.312860: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 32243712 totalling 30.75MiB
2019-08-20 22:56:50.312869: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 32505856 totalling 31.00MiB
2019-08-20 22:56:50.312879: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 33554432 totalling 64.00MiB
2019-08-20 22:56:50.312888: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 7 Chunks of size 37748736 totalling 252.00MiB
2019-08-20 22:56:50.312898: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 38061056 totalling 36.30MiB
2019-08-20 22:56:50.312909: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 40894464 totalling 39.00MiB
2019-08-20 22:56:50.312918: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 50394368 totalling 48.06MiB
2019-08-20 22:56:50.312927: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 28 Chunks of size 83886080 totalling 2.19GiB
2019-08-20 22:56:50.312937: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 141774848 totalling 135.21MiB
2019-08-20 22:56:50.312947: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 142606336 totalling 136.00MiB
2019-08-20 22:56:50.312956: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 257315840 totalling 245.40MiB
2019-08-20 22:56:50.312966: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 14.55GiB
2019-08-20 22:56:50.312975: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 15652398592 memory_limit_: 15652398695 available bytes: 103 curr_region_allocation_bytes_: 17179869184
2019-08-20 22:56:50.312997: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit: 15652398695
InUse: 15626903296
MaxInUse: 15647874816
NumAllocs: 4685
MaxAllocSize: 257315840
2019-08-20 22:56:50.313127: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****************************************************************************************************
2019-08-20 22:56:50.313180: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[1,20,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocato
r GPU_0_bfc
Traceback (most recent call last):
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,20,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node gradients/model/h18/attn/truediv_1_grad/Neg}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./train.py", line 291, in <module>
main()
File "./train.py", line 269, in main
feed_dict={context: sample_batch()})
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,20,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node gradients/model/h18/attn/truediv_1_grad/Neg (defined at /home/surya/gpt-2/src/memory_saving_gradients.py:216) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Errors may have originated from an input operation.
Input Source operations connected to node gradients/model/h18/attn/truediv_1_grad/Neg:
model/h18/attn/Exp_1 (defined at /home/surya/gpt-2/src/memory_saving_gradients.py:204)
Original stack trace for 'gradients/model/h18/attn/truediv_1_grad/Neg':
File "./train.py", line 291, in <module>
main()
File "./train.py", line 138, in main
opt_grads = memory_saving_gradients.gradients(loss, train_vars)
File "/home/surya/gpt-2/src/memory_saving_gradients.py", line 216, in gradients
dv = tf_gradients(ys=copied_ys, xs=boundary+xs, grad_ys=grad_ys, **kwargs)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 731, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 403, in _MaybeCompile
return grad_fn() # Exit early
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 731, in <lambda>
lambda: grad_fn(op, *out_grads))
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/ops/math_grad.py", line 1147, in _RealDivGrad
grad * math_ops.realdiv(math_ops.realdiv(-x, y), y), ry),
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 6633, in neg
"Neg", x=x, name=name)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
...which was originally created as op 'model/h18/attn/truediv_1', defined at:
File "./train.py", line 291, in <module>
main()
[elided 0 identical lines from previous traceback]
File "./train.py", line 138, in main
opt_grads = memory_saving_gradients.gradients(loss, train_vars)
File "/home/surya/gpt-2/src/memory_saving_gradients.py", line 204, in gradients
copied_sgv, info = ge.copy_with_input_replacements(ge.sgv(ops_to_copy), {})
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/contrib/graph_editor/transform.py", line 672, in copy_with_input_replacements
sgv, dst_graph, dst_scope, src_scope, reuse_dst_scope=reuse_dst_scope)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/contrib/graph_editor/transform.py", line 452, in __call__
self._copy_ops(info)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/contrib/graph_editor/transform.py", line 466, in _copy_ops
op_, op_outputs_ = self.transform_op_handler(info, op, new_inputs)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/contrib/graph_editor/transform.py", line 176, in copy_op_handler
[], input_types_, None, op_def_)
File "/root/miniconda3/envs/tft/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
Running on personal machine with GPUs/everything installed. Worked for 345M model well but getting into memory issues for 774.
I made sure memory saving gradients were on and batch size was just 1, any suggestions?
Same here :+1:
We are looking for a solution here: https://github.com/minimaxir/gpt-2-simple/issues/108
I was able to get it working on a Tesla P40 GPU (24GB), was still OOM with Adam but switching optimizer to sgd
made it work.
I think there was another person who said they got it working on a 24GB GPU. The unfortunate part is gcloud only offers V100 at most and that's all I and many others have access to at the moment. That's why I'm trying to find new optimizers or ways we can distribute the memory (although historically it's been hard for fully connected transformers). More here https://github.com/dantuluri/gpt-2
Yep it's really hard to find anything larger than 16GB but Azure does offer a 24GB instance (ND6s, in case someone wants to try it out).
Works fine with SGD on Titan RTX (24Gb).
@mgrankin is there some kind of cloud provider that provides RTX?
@saippuakauppias I don't know any. Nvidia discourages use of RTX in datacenters.
Just FYI, I have been able to fine-tune a sub-set of variables under Colab with good results. My forked notebook (via Tenoke's fork) demonstrating how to do this is here: https://github.com/jkraybill/gpt-2/blob/finetuning/GPT2-finetuning2-774M.ipynb
I got the tip from an under-the-radar tweet by BasedBlue. (See comments here: https://github.com/minimaxir/gpt-2-simple/issues/108#issuecomment-533842411)
@saippuakauppias FWIW I just learned today that Linode is running a paid pilot with 24GB Quadro RTX's. Haven't used them personally.
@jkraybill Thanks! But it is not clear whether they can be rented on an hourly basis. For a month it is insanely expensive...
I started fine-tuning the model on a 48GB GPU today, with adam optimizer it is using 33GB of memory, I have even been able to increase the batch size due to the extra capacity of the GPU.
@nkk0, Adafactor optimizer + checkpoints use ~8GB GPU RAM on 1 batch. Read all my comments in 108 issue in minimaxir repo (link above).