pix2pixHD
pix2pixHD copied to clipboard
CUDA out of memory OR gradient overflow with fp16
Hi all, I am either getting CUDA out of memory or gradient overflow when I enable the the fp16 option. The train runs normally without the "fp16". Any ideas?
Here it the gradient overflow:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.5
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.03125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.015625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00390625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.001953125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0009765625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00048828125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.000244140625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0001220703125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.103515625e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0517578125e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.52587890625e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.62939453125e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.814697265625e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9073486328125e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.5367431640625e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.76837158203125e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.384185791015625e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1920928955078125e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.960464477539063e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9802322387695312e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4901161193847656e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.450580596923828e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.725290298461914e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.862645149230957e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.313225746154785e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.656612873077393e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3283064365386963e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1641532182693481e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.820766091346741e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9103830456733704e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4551915228366852e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.275957614183426e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.637978807091713e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8189894035458565e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.094947017729282e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.547473508864641e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2737367544323206e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1368683772161603e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.684341886080802e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.842170943040401e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4210854715202004e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.105427357601002e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.552713678800501e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7763568394002505e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.881784197001252e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.440892098500626e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.220446049250313e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1102230246251565e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.551115123125783e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7755575615628914e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3877787807814457e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.938893903907228e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.469446951953614e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.734723475976807e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.673617379884035e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.336808689942018e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.168404344971009e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0842021724855044e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.421010862427522e-20
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.710505431213761e-20
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3552527156068805e-20
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.776263578034403e-21
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.3881317890172014e-21
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6940658945086007e-21
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.470329472543003e-22
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.235164736271502e-22
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.117582368135751e-22
Traceback (most recent call last):
File "train.py", line 85, in
File "/home//anaconda3/envs/pix2pixHD/lib/python3.7/contextlib.py", line 119, in exit
next(self.gen)
File "/home//anaconda3/envs/pix2pixHD/lib/python3.7/site-packages/apex/amp/handle.py", line 127, in scale_loss
should_skip = False if delay_overflow_check else loss_scaler.update_scale()
File "/home//anaconda3/envs/pix2pixHD/lib/python3.7/site-packages/apex/amp/scaler.py", line 200, in update_scale
self._has_overflow = self._overflow_buf.item()
RuntimeError: CUDA error: an illegal memory access was encountered
Have you solve it?