DeepFaceLab icon indicating copy to clipboard operation
DeepFaceLab copied to clipboard

oom when allocating tensor with shape

Open Silomat opened this issue 4 years ago • 3 comments

THIS IS NOT TECH SUPPORT FOR NEWBIE FAKERS POST ONLY ISSUES RELATED TO BUGS OR CODE

Expected behavior

I was trying to train the ai, but instead I got this error:

Running trainer.

[new] No saved models found. Enter a name of a new model : mal mal

Model first run.

Choose one or several GPU idxs (separated by comma).

[CPU] : CPU [0] : NVIDIA GeForce GTX 1650

[0] Which GPU indexes to choose? : 0

[0] Autobackup every N hour ( 0..24 ?:help ) : 0.5 0 [n] Write preview history ( y/n ?:help ) : n [300000] Target iteration : 300000 [n] Flip SRC faces randomly ( y/n ?:help ) : n [y] Flip DST faces randomly ( y/n ?:help ) : y [10] Batch_size ( ?:help ) : 10 [128] Resolution ( 64-640 ?:help ) : 128 [f] Face type ( h/mf/f/wf/head ?:help ) : f [liae-ud] AE architecture ( ?:help ) : liae-ud [256] AutoEncoder dimensions ( 32-1024 ?:help ) : 256 [64] Encoder dimensions ( 16-256 ?:help ) : 64 [64] Decoder dimensions ( 16-256 ?:help ) : 64 [22] Decoder mask dimensions ( 16-256 ?:help ) : 22 [n] Eyes and mouth priority ( y/n ?:help ) : n [n] Uniform yaw distribution of samples ( y/n ?:help ) : n [n] Blur out mask ( y/n ?:help ) : n [y] Place models and optimizer on GPU ( y/n ?:help ) : y [y] Use AdaBelief optimizer? ( y/n ?:help ) : y [n] Use learning rate dropout ( n/y/cpu ?:help ) : n [y] Enable random warp of samples ( y/n ?:help ) : y [0.0] Random hue/saturation/light intensity ( 0.0 .. 0.3 ?:help ) : 0.0 [0.0] GAN power ( 0.0 .. 5.0 ?:help ) : 0.0 [0.0] Face style power ( 0.0..100.0 ?:help ) : 0.0 [0.0] Background style power ( 0.0..100.0 ?:help ) : 0.0 [none] Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) : none [y] Enable gradient clipping ( y/n ?:help ) : y [y] Enable pretraining mode ( y/n ?:help ) : y Initializing models: 100%|###############################################################| 5/5 [00:01<00:00, 2.51it/s] Loaded 15843 packed faces from D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\pretrain_faces Sort by yaw: 100%|##################################################################| 128/128 [00:00<00:00, 361.55it/s] Sort by yaw: 100%|##################################################################| 128/128 [00:00<00:00, 368.88it/s] ================== Model Summary =================== == == == Model name: mal_SAEHD == == == == Current iteration: 0 == == == ==---------------- Model Options -----------------== == == == resolution: 128 == == face_type: f == == models_opt_on_gpu: True == == archi: liae-ud == == ae_dims: 256 == == e_dims: 64 == == d_dims: 64 == == d_mask_dims: 22 == == masked_training: True == == eyes_mouth_prio: False == == uniform_yaw: True == == blur_out_mask: False == == adabelief: True == == lr_dropout: n == == random_warp: False == == random_hsv_power: 0.0 == == true_face_power: 0.0 == == face_style_power: 0.0 == == bg_style_power: 0.0 == == ct_mode: none == == clipgrad: True == == pretrain: True == == autobackup_hour: 0 == == write_preview_history: False == == target_iter: 300000 == == random_src_flip: False == == random_dst_flip: True == == batch_size: 10 == == gan_power: 0.0 == == gan_patch_size: 16 == == gan_dims: 16 == == == ==------------------ Running On ------------------== == == == Device index: 0 == == Name: NVIDIA GeForce GTX 1650 == == VRAM: 2.86GB == == ==

Starting. Target iteration: 300000. Press "Enter" to stop training and save model.

Trying to do the first iteration. If an error occurs, reduce the model parameters.

!!! Windows 10 users IMPORTANT notice. You should set this setting in order to work correctly. https://i.imgur.com/B7cmDCB.jpg !!! Error: OOM when allocating tensor with shape[10,128,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node DepthToSpace_13 (defined at D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\ops_init_.py:345) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[node concat_8 (defined at D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\DeepFaceLab\models\Model_SAEHD\Model.py:563) ]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'DepthToSpace_13', defined at: File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in bootstrap_inner File "threading.py", line 864, in run File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 425, in on_initialize gpu_pred_dst_dst, gpu_pred_dst_dstm = self.decoder(gpu_dst_code) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 225, in forward x = self.upscale2(x) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 73, in forward x = nn.depth_to_space(x, 2) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\ops_init.py", line 345, in depth_to_space return tf.depth_to_space(x, size, data_format=nn.data_format) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2703, in depth_to_space return gen_array_ops.depth_to_space(input, block_size, data_format, name=name) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 1593, in depth_to_space data_format=data_format, name=name) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op op_def=op_def) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10,128,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node DepthToSpace_13 (defined at D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\ops_init_.py:345) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[node concat_8 (defined at D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\DeepFaceLab\models\Model_SAEHD\Model.py:563) ]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last): File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10,128,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node DepthToSpace_13}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node concat_8}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\mainscripts\Trainer.py", line 129, in trainerThread iter, iter_time = model.train_one_iter() File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\models\ModelBase.py", line 474, in train_one_iter losses = self.onTrainOneIter() File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 774, in onTrainOneIter src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 584, in src_dst_train self.target_dstm_em:target_dstm_em, File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run run_metadata) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1348, in do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10,128,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node DepthToSpace_13 (defined at D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\ops_init.py:345) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[node concat_8 (defined at D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\DeepFaceLab\models\Model_SAEHD\Model.py:563) ]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'DepthToSpace_13', defined at: File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in bootstrap_inner File "threading.py", line 864, in run File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 425, in on_initialize gpu_pred_dst_dst, gpu_pred_dst_dstm = self.decoder(gpu_dst_code) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 225, in forward x = self.upscale2(x) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 73, in forward x = nn.depth_to_space(x, 2) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\ops_init.py", line 345, in depth_to_space return tf.depth_to_space(x, size, data_format=nn.data_format) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2703, in depth_to_space return gen_array_ops.depth_to_space(input, block_size, data_format, name=name) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 1593, in depth_to_space data_format=data_format, name=name) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op op_def=op_def) File "D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10,128,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node DepthToSpace_13 (defined at D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\ops_init_.py:345) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[node concat_8 (defined at D:\lol\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\DeepFaceLab\models\Model_SAEHD\Model.py:563) ]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

I have no idea what that could mean, theorattically I've done everything correct

Silomat avatar Dec 03 '21 21:12 Silomat

My guess is that you ran out of RAM. ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10,128,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Lower your "settings" or buying more RAM seems to be the normal answer to this problem :)

qkum avatar Dec 13 '21 02:12 qkum

Did you ever find the answer? If so, would you mind sharing it and closing this issue?

joolstorrentecalo avatar Jun 08 '23 23:06 joolstorrentecalo

yes i find the solution and we have to lower the settings suppose batch size and resolution etc

kamranahmed786 avatar Sep 19 '23 13:09 kamranahmed786