DeepFaceLab_Linux
DeepFaceLab_Linux copied to clipboard
XSeg_train unable to run
Environment
$ uname -a
Linux GPU-01 5.4.0-120-generic #136-Ubuntu SMP Fri Jun 10 13:40:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ nvidia-smi
Sat Jul 2 12:23:56 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05 Driver Version: 510.73.05 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:17:00.0 Off | N/A |
| 0% 35C P8 12W / 250W | 1091MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:65:00.0 Off | N/A |
| 0% 30C P8 11W / 250W | 8MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:66:00.0 Off | N/A |
| 0% 31C P8 10W / 250W | 6690MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Steps to reproduce
I am following the guide from Druuzil Tech and Games here.
- Copy data_src.mp4 into workspace
- Copy the 1.7GB RTT model into workspace/model
- Copy the 9GB RTM WF Faceset into workspace/data_dst/aligned
./2_extract_image_from_data_src.sh./4_data_src_extract_faces_S3FD.sh- Faced a GPU error and downgrade tensorflow-gpu to 2.3.1 as per #20
./5_XSeg_data_src_mask_apply.sh./5_XSeg_train.sh
Error Output
Loading samples: 100%|#########################################################################################| 25461/25461 [00:57<00:00, 439.62it/s]
Loaded 63012 packed faces from /data/home/kryan/DeepFaceLab_Linux/workspace/data_dst/aligned
Filtering: 100%|##############################################################################################| 88473/88473 [00:58<00:00, 1514.01it/s]
Using 278 segmented samples.
================== Model Summary ==================
== ==
== Model name: XSeg ==
== ==
== Current iteration: 1 ==
== ==
==---------------- Model Options ----------------==
== ==
== face_type: wf ==
== pretrain: False ==
== batch_size: 8 ==
== ==
==----------------- Running On ------------------==
== ==
== Device index: 0 ==
== Name: NVIDIA GeForce GTX 1080 Ti ==
== VRAM: 9.03GB ==
== ==
===================================================
Starting. Press "Enter" to stop training and save model.
: cannot connect to X server .8308]
Error: DNN Backward Data function launch failure : input shape([8,32,258,258]) filter shape([3,3,32,1])
[[node gradients/Conv2D_30_grad/Conv2DBackpropInput (defined at /DeepFaceLab_Linux/DeepFaceLab/core/leras/ops/__init__.py:55) ]]
Errors may have originated from an input operation.
Input Source operations connected to node gradients/Conv2D_30_grad/Conv2DBackpropInput:
XSeg/out_conv/weight/read (defined at /DeepFaceLab_Linux/DeepFaceLab/core/leras/layers/Conv2D.py:61)
Original stack trace for 'gradients/Conv2D_30_grad/Conv2DBackpropInput':
File "/.conda/envs/deepfacelab/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/.conda/envs/deepfacelab/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/.conda/envs/deepfacelab/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/DeepFaceLab_Linux/DeepFaceLab/mainscripts/Trainer.py", line 58, in trainerThread
debug=debug)
File "/DeepFaceLab_Linux/DeepFaceLab/models/Model_XSeg/Model.py", line 17, in __init__
super().__init__(*args, force_model_class_name='XSeg', **kwargs)
File "/DeepFaceLab_Linux/DeepFaceLab/models/ModelBase.py", line 193, in __init__
self.on_initialize()
File "/DeepFaceLab_Linux/DeepFaceLab/models/Model_XSeg/Model.py", line 118, in on_initialize
gpu_loss_gvs += [ nn.gradients ( gpu_loss, self.model.get_weights() ) ]
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/ops/__init__.py", line 55, in tf_gradients
grads = gradients.gradients(loss, vars, colocate_gradients_with_ops=True )
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 172, in gradients
unconnected_gradients)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 669, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 336, in _MaybeCompile
return grad_fn() # Exit early
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 669, in <lambda>
lambda: grad_fn(op, *out_grads))
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/nn_grad.py", line 596, in _Conv2DGrad
data_format=data_format),
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1300, in conv2d_backprop_input
name=name)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3485, in _create_op_internal
op_def=op_def)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
self._traceback = tf_stack.extract_stack()
...which was originally created as op 'Conv2D_30', defined at:
File "/.conda/envs/deepfacelab/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
[elided 4 identical lines from previous traceback]
File "/DeepFaceLab_Linux/DeepFaceLab/models/ModelBase.py", line 193, in __init__
self.on_initialize()
File "/DeepFaceLab_Linux/DeepFaceLab/models/Model_XSeg/Model.py", line 103, in on_initialize
gpu_pred_logits_t, gpu_pred_t = self.model.flow(gpu_input_t, pretrain=self.pretrain)
File "/DeepFaceLab_Linux/DeepFaceLab/facelib/XSegNet.py", line 85, in flow
return self.model(x, pretrain=pretrain)
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/models/ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/models/XSeg.py", line 167, in forward
logits = self.out_conv(x)
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/layers/LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/layers/Conv2D.py", line 101, in forward
x = tf.nn.conv2d(x, weight, strides, 'VALID', dilations=dilations, data_format=nn.data_format)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 2273, in conv2d
name=name)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 979, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3485, in _create_op_internal
op_def=op_def)
Traceback (most recent call last):
File "/data/home/kryan/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/data/home/kryan/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/data/home/kryan/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: DNN Backward Data function launch failure : input shape([8,32,258,258]) filter shape([3,3,32,1])
[[{{node gradients/Conv2D_30_grad/Conv2DBackpropInput}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/home/kryan/DeepFaceLab_Linux/DeepFaceLab/mainscripts/Trainer.py", line 129, in trainerThread
iter, iter_time = model.train_one_iter()
File "/data/home/kryan/DeepFaceLab_Linux/DeepFaceLab/models/ModelBase.py", line 474, in train_one_iter
losses = self.onTrainOneIter()
File "/data/home/kryan/DeepFaceLab_Linux/DeepFaceLab/models/Model_XSeg/Model.py", line 194, in onTrainOneIter
loss = self.train (image_np, target_np)
File "/data/home/kryan/DeepFaceLab_Linux/DeepFaceLab/models/Model_XSeg/Model.py", line 136, in train
l, _ = nn.tf_sess.run ( [loss, loss_gv_op], feed_dict={self.model.input_t :input_np, self.model.target_t :target_np })
File "/data/home/kryan/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 958, in run
run_metadata_ptr)
File "/data/home/kryan/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1181, in _run
feed_dict_tensor, options, run_metadata)
File "/data/home/kryan/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/data/home/kryan/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: DNN Backward Data function launch failure : input shape([8,32,258,258]) filter shape([3,3,32,1])
[[node gradients/Conv2D_30_grad/Conv2DBackpropInput (defined at /DeepFaceLab_Linux/DeepFaceLab/core/leras/ops/__init__.py:55) ]]
Errors may have originated from an input operation.
Input Source operations connected to node gradients/Conv2D_30_grad/Conv2DBackpropInput:
XSeg/out_conv/weight/read (defined at /DeepFaceLab_Linux/DeepFaceLab/core/leras/layers/Conv2D.py:61)
Original stack trace for 'gradients/Conv2D_30_grad/Conv2DBackpropInput':
File "/.conda/envs/deepfacelab/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/.conda/envs/deepfacelab/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/.conda/envs/deepfacelab/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/DeepFaceLab_Linux/DeepFaceLab/mainscripts/Trainer.py", line 58, in trainerThread
debug=debug)
File "/DeepFaceLab_Linux/DeepFaceLab/models/Model_XSeg/Model.py", line 17, in __init__
super().__init__(*args, force_model_class_name='XSeg', **kwargs)
File "/DeepFaceLab_Linux/DeepFaceLab/models/ModelBase.py", line 193, in __init__
self.on_initialize()
File "/DeepFaceLab_Linux/DeepFaceLab/models/Model_XSeg/Model.py", line 118, in on_initialize
gpu_loss_gvs += [ nn.gradients ( gpu_loss, self.model.get_weights() ) ]
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/ops/__init__.py", line 55, in tf_gradients
grads = gradients.gradients(loss, vars, colocate_gradients_with_ops=True )
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 172, in gradients
unconnected_gradients)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 669, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 336, in _MaybeCompile
return grad_fn() # Exit early
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py", line 669, in <lambda>
lambda: grad_fn(op, *out_grads))
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/nn_grad.py", line 596, in _Conv2DGrad
data_format=data_format),
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1300, in conv2d_backprop_input
name=name)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3485, in _create_op_internal
op_def=op_def)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1949, in __init__
self._traceback = tf_stack.extract_stack()
...which was originally created as op 'Conv2D_30', defined at:
File "/.conda/envs/deepfacelab/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
[elided 4 identical lines from previous traceback]
File "/DeepFaceLab_Linux/DeepFaceLab/models/ModelBase.py", line 193, in __init__
self.on_initialize()
File "/DeepFaceLab_Linux/DeepFaceLab/models/Model_XSeg/Model.py", line 103, in on_initialize
gpu_pred_logits_t, gpu_pred_t = self.model.flow(gpu_input_t, pretrain=self.pretrain)
File "/DeepFaceLab_Linux/DeepFaceLab/facelib/XSegNet.py", line 85, in flow
return self.model(x, pretrain=pretrain)
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/models/ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/models/XSeg.py", line 167, in forward
logits = self.out_conv(x)
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/layers/LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
File "/DeepFaceLab_Linux/DeepFaceLab/core/leras/layers/Conv2D.py", line 101, in forward
x = tf.nn.conv2d(x, weight, strides, 'VALID', dilations=dilations, data_format=nn.data_format)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 2273, in conv2d
name=name)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 979, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/.conda/envs/deepfacelab/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3485, in _create_op_internal
op_def=op_def)
@kwokyto Hi, where did you get the 1.7GB RTT model?
conda create -n deepfacelab -c main python=3.7 cudnn=7.6.5 cudatoolkit=10.1.243
replace requirements_cuda.txt with this
tqdm numpy numexpr h5py==3.1.0 opencv-python==4.1.0.25 ffmpeg-python==0.1.17 scikit-image==0.14.2 scipy==1.4.1 colorama tensorflow-gpu==2.4.0 pyqt5 tf2onnx==1.9.3 ffmpeg
python -m pip install -r ./DeepFaceLab/requirements-cuda.txt