gtx1080TI vs rtx3080
Hi,
i've switched computers from an desktop with; win10 gtx1080ti gpu amd fx-8379 cpu 16gb memory
to: win11 rtx3080 gpu i9-12900kf cpu 32gb memory
i am trying to run model with these settings: ================== Model Summary =================== == == == Model name: DF-UD384_SAEHD == == == == Current iteration: 561403 == == == ==---------------- Model Options -----------------== == == == resolution: 384 == == face_type: wf == == models_opt_on_gpu: True == == archi: df-ud == == ae_dims: 352 == == e_dims: 88 == == d_dims: 88 == == d_mask_dims: 16 == == masked_training: True == == eyes_mouth_prio: True == == uniform_yaw: False == == adabelief: True == == lr_dropout: y == == random_warp: False == == true_face_power: 0.2 == == face_style_power: 0.0 == == bg_style_power: 0.0 == == ct_mode: none == == clipgrad: False == == pretrain: False == == autobackup_hour: 0 == == write_preview_history: False == == target_iter: 0 == == random_src_flip: False == == random_dst_flip: True == == batch_size: 4 == == gan_power: 0.0 == == gan_patch_size: 48 == == gan_dims: 16 == == blur_out_mask: False == == random_hsv_power: 0.0 == == == ==------------------ Running On ------------------== == == == Device index: 0 == == Name: NVIDIA GeForce RTX 3080 == == VRAM: 7.27GB == == ==
it has been working fine on the gtx1080ti for over 500k iterations. but wont run on the new rig(rtx3080).
getting following erros: Error: 2 root error(s) found. (0) Resource exhausted: failed to allocate memory [[node mul_129 (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[concat_4/concat/_547]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) Resource exhausted: failed to allocate memory [[node mul_129 (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations. 0 derived errors ignored.
Errors may have originated from an input operation. Input Source operations connected to node mul_129: src_dst_opt/vs_inter/dense2/weight_0/read (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38)
Input Source operations connected to node mul_129: src_dst_opt/vs_inter/dense2/weight_0/read (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38)
Original stack trace for 'mul_129': File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in _bootstrap_inner File "threading.py", line 864, in run File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 564, in on_initialize src_dst_loss_gv_op = self.src_dst_opt.get_update_op (nn.average_gv_list (gpu_G_loss_gvs)) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 64, in get_update_op v_t = self.beta_2*vs + (1.0-self.beta_2) * tf.square(g-m_t) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1076, in _run_op return tensor_oper(a.value(), *args, **kwargs) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1400, in r_binary_op_wrapper return func(x, y, name=name) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1710, in _mul_dispatch return multiply(x, y, name=name) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper return target(*args, **kwargs) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 530, in multiply return gen_math_ops.mul(x, y, name) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 6245, in mul "Mul", x=x, y=y, name=name) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal op_def=op_def) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init self._traceback = tf_stack.extract_stack_for_node(self._c_op)
Traceback (most recent call last): File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call return fn(*args) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn target_list, run_metadata) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: failed to allocate memory [[{{node mul_129}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[concat_4/concat/_547]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) Resource exhausted: failed to allocate memory [[{{node mul_129}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 129, in trainerThread iter, iter_time = model.train_one_iter() File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 474, in train_one_iter losses = self.onTrainOneIter() File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 774, in onTrainOneIter src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 584, in src_dst_train self.target_dstm_em:target_dstm_em, File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run run_metadata_ptr) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run feed_dict_tensor, options, run_metadata) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run run_metadata) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: failed to allocate memory [[node mul_129 (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[concat_4/concat/_547]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) Resource exhausted: failed to allocate memory [[node mul_129 (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations. 0 derived errors ignored.
Errors may have originated from an input operation. Input Source operations connected to node mul_129: src_dst_opt/vs_inter/dense2/weight_0/read (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38)
Input Source operations connected to node mul_129: src_dst_opt/vs_inter/dense2/weight_0/read (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38)
Original stack trace for 'mul_129': File "threading.py", line 884, in _bootstrap File "threading.py", line 916, in _bootstrap_inner File "threading.py", line 864, in run File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 564, in on_initialize src_dst_loss_gv_op = self.src_dst_opt.get_update_op (nn.average_gv_list (gpu_G_loss_gvs)) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 64, in get_update_op v_t = self.beta_2*vs + (1.0-self.beta_2) * tf.square(g-m_t) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1076, in _run_op return tensor_oper(a.value(), *args, **kwargs) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1400, in r_binary_op_wrapper return func(x, y, name=name) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1710, in _mul_dispatch return multiply(x, y, name=name) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper return target(*args, **kwargs) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 530, in multiply return gen_math_ops.mul(x, y, name) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 6245, in mul "Mul", x=x, y=y, name=name) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper attrs=attr_protos, op_def=op_def) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal op_def=op_def) File "D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init self._traceback = tf_stack.extract_stack_for_node(self._c_op)
Any sugestions? paging file enabled on 500gb, and hardware accelerated gpu scheduling enabled
"==------------------ Running On ------------------== == == == Device index: 0 == == Name: NVIDIA GeForce RTX 3080 == == VRAM: 7.27GB == == =="
Some models of the 3080 come with 10GB of vram instead of 12GB. The 1080ti has 12GB of vram. Plus it looks like you only have 7.27GB of vram available to the process. Since you're running Windows that means you have a portion of your vram being consumed by the desktop itself as well as other applications.
The additional errors also give the clue to a memory issue:
"(0) Resource exhausted: failed to allocate memory [[node mul_129 (defined at D:\New folder\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to"
It says "failed to allocate memory" and indicated OOM (Out Of Memory).
I do not have many suggestions for saving memory on Windows outside of closing everything you can while running DFL, and possibly even lowering your resolution (higher resolutions obviously require more memory).
If that doesn't work for you, then you're only option is to use a barebones GUI Linux system(like XFCE and is very light on resource usage), or even a headless system and manage the process over SSH (which is the only sensible choice IMO).
Did you ever find the answer? If so, would you mind sharing it and closing this issue?