DeepFaceLab icon indicating copy to clipboard operation
DeepFaceLab copied to clipboard

Error when trying to utilise existing models/pre-trained models

Open Loz13 opened this issue 3 years ago • 3 comments

I've searched through the forums and numerous fixes for at least 6 hours without luck.... here's the issue

I can successfully train a new SAEHD model, however, whenever I try to use existing models or pre-trained models, it says 2 root errors found. I have tried the following fixes but issue still pertains:

  • Used a range of other pre-trained or trained models including: ->https://github.com/iperov/DeepFaceLab/releases/tag/DF.wf.288res.384.92.72.22 ->https://mrdeepfakes.com/forums/thread-sharing-dfl-2-0-saehd-models
  • Downloading fresh version of the 20.11.2021 RTX3000 series, I have a RTX3070 Ti
  • Turned on "hardware-accelerated GPU scheduling" in windows
  • made a change to line in "$HOME/DeepFaceLab_Linux/DeepFaceLab/core/imagelib/warp.py" Original line: random_transform_mat = cv2.getRotationMatrix2D((w // 2, w // 2), rotation, scale) Updated line that now allows me to train via SAEHD without issue: random_transform_mat = cv2.getRotationMatrix2D((int(w // 2), int(w // 2)), rotation, scale)

Great thanks in advance and let me know if I missed any details....

==================== Model Summary ==================== == == == Model name: DF-UD448_SAEHD == == == == Current iteration: 248471 == == == ==------------------ Model Options ------------------== == == == resolution: 448 == == face_type: wf == == models_opt_on_gpu: True == == archi: df-ud == == ae_dims: 256 == == e_dims: 64 == == d_dims: 64 == == d_mask_dims: 22 == == masked_training: True == == uniform_yaw: True == == lr_dropout: n == == random_warp: True == == gan_power: 0.0 == == true_face_power: 0.0 == == face_style_power: 0.0 == == bg_style_power: 0.0 == == ct_mode: none == == clipgrad: False == == pretrain: False == == autobackup_hour: 0 == == write_preview_history: False == == target_iter: 0 == == random_flip: False == == batch_size: 8 == == eyes_mouth_prio: False == == blur_out_mask: False == == adabelief: True == == random_hsv_power: 0.0 == == random_src_flip: False == == random_dst_flip: True == == gan_patch_size: 56 == == gan_dims: 16 == == == ==------------------- Running On --------------------== == == == Device index: 0 == == Name: NVIDIA GeForce RTX 3070 Ti == == VRAM: 5.34GB == == ==

Starting. Press "Enter" to stop training and save model. Error: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[1024,116,116] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Pad_7 (defined at C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

     [[concat_5/concat/_465]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

(1) Resource exhausted: OOM when allocating tensor with shape[1024,116,116] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Pad_7 (defined at C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations. 0 derived errors ignored.

Errors may have originated from an input operation. Input Source operations connected to node Pad_7: LeakyRelu_6 (defined at C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py:29)

Input Source operations connected to node Pad_7: LeakyRelu_6 (defined at C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py:29)

Original stack trace for 'Pad_7': File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in _bootstrap_inner File "threading.py", line 864, in run File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 408, in on_initialize gpu_dst_code = self.inter(self.encoder(gpu_warped_dst)) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 119, in forward x = self.down1(x) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 63, in forward x = down(x) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 43, in forward x = self.conv1(x) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in call return self.forward(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\Conv2D.py", line 87, in forward x = tf.pad (x, padding, mode='CONSTANT') File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper return target(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3528, in pad result = gen_array_ops.pad(tensor, paddings, name=name) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 6487, in pad "Pad", input=input, paddings=paddings, name=name) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal op_def=op_def) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last): File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call return fn(*args) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn target_list, run_metadata) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[1024,116,116] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node Pad_7}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

     [[concat_5/concat/_465]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

(1) Resource exhausted: OOM when allocating tensor with shape[1024,116,116] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node Pad_7}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 129, in trainerThread iter, iter_time = model.train_one_iter() File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 474, in train_one_iter losses = self.onTrainOneIter() File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 774, in onTrainOneIter src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 584, in src_dst_train self.target_dstm_em:target_dstm_em, File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run run_metadata_ptr) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run run_metadata) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[1024,116,116] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Pad_7 (defined at C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

     [[concat_5/concat/_465]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

(1) Resource exhausted: OOM when allocating tensor with shape[1024,116,116] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Pad_7 (defined at C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations. 0 derived errors ignored.

Errors may have originated from an input operation. Input Source operations connected to node Pad_7: LeakyRelu_6 (defined at C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py:29)

Input Source operations connected to node Pad_7: LeakyRelu_6 (defined at C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py:29)

Original stack trace for 'Pad_7': File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in _bootstrap_inner File "threading.py", line 864, in run File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 408, in on_initialize gpu_dst_code = self.inter(self.encoder(gpu_warped_dst)) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 119, in forward x = self.down1(x) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 63, in forward x = down(x) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in call return self.forward(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 43, in forward x = self.conv1(x) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in call return self.forward(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\Conv2D.py", line 87, in forward x = tf.pad (x, padding, mode='CONSTANT') File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper return target(*args, **kwargs) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3528, in pad result = gen_array_ops.pad(tensor, paddings, name=name) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 6487, in pad "Pad", input=input, paddings=paddings, name=name) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal op_def=op_def) File "C:\Users\Frank\Desktop\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init self._traceback = tf_stack.extract_stack_for_node(self._c_op)`

Loz13 avatar Mar 04 '22 10:03 Loz13

exactly on the same boat, you can have it working disabling gpu optimization on the options, that way CPU will help the GPU however is really slow...

KaiserRRR avatar Mar 25 '22 11:03 KaiserRRR

Did you ever find the answer? If so, would you mind sharing it and closing this issue?

joolstorrentecalo avatar Jun 08 '23 23:06 joolstorrentecalo

The issue you are facing with the trained model is due to the resolution. Your GPU does not support this high resolution and hence the tensor issue. You can check your model specifications against ur GPU.

Resource exhausted: OOM when allocating tensor with shape[1024,116,116] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

tvudaykumar avatar Aug 08 '23 13:08 tvudaykumar