DeepFaceLab
DeepFaceLab copied to clipboard
Can't load saved model
Expected behavior
When running 6) train SAEHD.bat again after having ran it once and saved the model, I would expect it to continue training.
Actual behavior
Throws the following error:
Running trainer.
Choose one of saved models, or enter a name to create a new model.
[r] : rename
[d] : delete
[0] : AubreyGillian - latest
:
0
Loading AubreyGillian_SAEHD model...
Choose one or several GPU idxs (separated by comma).
[CPU] : CPU
[0] : GeForce RTX 2060
[0] Which GPU indexes to choose? :
0
Initializing models: 100%|###############################################################| 5/5 [00:12<00:00, 2.43s/it]
Loading samples: 100%|############################################################| 1045/1045 [00:08<00:00, 123.68it/s]
Loading samples: 100%|###############################################################| 617/617 [00:16<00:00, 38.24it/s]
================ Model Summary =================
== ==
== Model name: AubreyGillian_SAEHD ==
== ==
== Current iteration: 10368 ==
== ==
==-------------- Model Options ---------------==
== ==
== resolution: 128 ==
== face_type: f ==
== models_opt_on_gpu: True ==
== archi: liae-ud ==
== ae_dims: 256 ==
== e_dims: 64 ==
== d_dims: 64 ==
== d_mask_dims: 22 ==
== masked_training: True ==
== eyes_mouth_prio: False ==
== uniform_yaw: False ==
== adabelief: True ==
== lr_dropout: n ==
== random_warp: True ==
== true_face_power: 0.0 ==
== face_style_power: 0.0 ==
== bg_style_power: 0.0 ==
== ct_mode: none ==
== clipgrad: True ==
== pretrain: False ==
== autobackup_hour: 1 ==
== write_preview_history: False ==
== target_iter: 0 ==
== random_flip: True ==
== batch_size: 20 ==
== gan_power: 0.0 ==
== gan_patch_size: 16 ==
== gan_dims: 16 ==
== ==
==---------------- Running On ----------------==
== ==
== Device index: 0 ==
== Name: GeForce RTX 2060 ==
== VRAM: 6.00GB ==
== ==
================================================
Starting. Press "Enter" to stop training and save model.
Error: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[512,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node Conv2D_12 (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:99) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[concat_39/concat/_809]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[512,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node Conv2D_12 (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:99) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node Conv2D_12:
Pad_12 (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:97)
decoder/res0/conv1/weight/read (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:76)
Input Source operations connected to node Conv2D_12:
Pad_12 (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:97)
decoder/res0/conv1/weight/read (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:76)
Original stack trace for 'Conv2D_12':
File "threading.py", line 884, in _bootstrap
File "threading.py", line 916, in _bootstrap_inner
File "threading.py", line 864, in run
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug,
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\models\ModelBase.py", line 189, in __init__
self.on_initialize()
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 381, in on_initialize
gpu_pred_src_src, gpu_pred_src_srcm = self.decoder(gpu_src_code)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 154, in forward
x = self.res0(x)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 68, in forward
x = self.conv1(inp)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py", line 99, in forward
x = tf.nn.conv2d(x, weight, self.strides, 'VALID', dilations=self.dilations, data_format=nn.data_format)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2279, in conv2d
name=name)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 972, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3536, in _create_op_internal
op_def=op_def)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 1990, in __init__
self._traceback = tf_stack.extract_stack()
Traceback (most recent call last):
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
return fn(*args)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
target_list, run_metadata)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[512,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node Conv2D_12}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[concat_39/concat/_809]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[512,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node Conv2D_12}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\mainscripts\Trainer.py", line 130, in trainerThread
iter, iter_time = model.train_one_iter()
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\models\ModelBase.py", line 462, in train_one_iter
losses = self.onTrainOneIter()
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 678, in onTrainOneIter
src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 538, in src_dst_train
self.target_dstm_em:target_dstm_em,
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
run_metadata_ptr)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run
run_metadata)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[512,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node Conv2D_12 (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:99) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[concat_39/concat/_809]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[512,512,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node Conv2D_12 (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:99) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node Conv2D_12:
Pad_12 (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:97)
decoder/res0/conv1/weight/read (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:76)
Input Source operations connected to node Conv2D_12:
Pad_12 (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:97)
decoder/res0/conv1/weight/read (defined at C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:76)
Original stack trace for 'Conv2D_12':
File "threading.py", line 884, in _bootstrap
File "threading.py", line 916, in _bootstrap_inner
File "threading.py", line 864, in run
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug,
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\models\ModelBase.py", line 189, in __init__
self.on_initialize()
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 381, in on_initialize
gpu_pred_src_src, gpu_pred_src_srcm = self.decoder(gpu_src_code)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 154, in forward
x = self.res0(x)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 68, in forward
x = self.conv1(inp)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\layers\Conv2D.py", line 99, in forward
x = tf.nn.conv2d(x, weight, self.strides, 'VALID', dilations=self.dilations, data_format=nn.data_format)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2279, in conv2d
name=name)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 972, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3536, in _create_op_internal
op_def=op_def)
File "C:\Programs\DeepFaceLab_NVIDIA\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 1990, in __init__
self._traceback = tf_stack.extract_stack()
Steps to reproduce
Run 6) train SAEHD.bat once, then hit enter to stop and save, run 6) train SAEHD.bat again.
Other relevant information
- Command lined used (if not specified in steps to reproduce): The content of
6) train SAEHD.bat:
@echo off
call _internal\setenv.bat
"%PYTHON_EXECUTABLE%" "%DFL_ROOT%\main.py" train ^
--training-data-src-dir "%WORKSPACE%\data_src\aligned" ^
--training-data-dst-dir "%WORKSPACE%\data_dst\aligned" ^
--pretraining-data-dir "%INTERNAL%\pretrain_CelebA" ^
--model-dir "%WORKSPACE%\model" ^
--model SAEHD
pause
- Operating system and version: Windows 10 Home [64-bit] (10.0.19042)
- Python version: 3.9.1
Did you ever find the answer?
I honestly can't remember...