faceswap-playground
faceswap-playground copied to clipboard
Error when train a model "tensorflow.python.eager.profiler.ProfilerAlreadyRunningError: Another profiler is running"
Hi! When I try a comman python3 faceswap.py train -A ./photo/fst -B ./photo/snd -m ./photo/models/ I got the problem (log below).
Might worth to mention, I complied tensorflow myself, as when I used pip install tensorflow I got tensorflow not found error.
The command extract works without problem.
Else, I had error can't find module named 'numpy.core._multiarray_umath, as mentioned is this issue https://github.com/deepfakes/faceswap-playground/issues/261, so I updated numpy via pip install (But there is a comment that numpy 16 is broken, I have 1.16.2 ).
crash_report.2019.03.03.222044908664.log
03/03/2019 22:20:25 MainProcess training_0 _base load_generator DEBUG Loading generator: b
03/03/2019 22:20:25 MainProcess training_0 _base load_generator DEBUG input_size: 64, output_size: 64
03/03/2019 22:20:25 MainProcess training_0 training_data __init__ DEBUG Initializing TrainingDataGenerator: (model_input_size: 64, model_output_shape: 64, training_opts: {'alignments': {'a': '/home/faceswap/photo/fst/alignments.json', 'b': '/home/faceswap/photo/snd/alignments.json'}, 'preview_scaling': 1.0, 'no_flip': False, 'preview_images': 14, 'training_size': 256, 'coverage_ratio': 0.625, 'mask_type': None, 'warp_to_landmarks': False, 'no_logs': False}, landmarks: False)
03/03/2019 22:20:25 MainProcess training_0 training_data set_mask_function DEBUG Mask function: None
03/03/2019 22:20:25 MainProcess training_0 training_data __init__ DEBUG Initializing ImageManipulation: (input_size: 64, output_size: 64, coverage_ratio: 0.625)
03/03/2019 22:20:25 MainProcess training_0 training_data __init__ DEBUG Initialized ImageManipulation
03/03/2019 22:20:25 MainProcess training_0 training_data __init__ DEBUG Initialized TrainingDataGenerator
03/03/2019 22:20:25 MainProcess training_0 training_data minibatch_ab DEBUG Queue batches: (image_count: 960, batchsize: 64, side: 'b', do_shuffle: True, is_timelapse: False)
03/03/2019 22:20:25 MainProcess training_0 queue_manager add_queue DEBUG QueueManager adding: (name: 'train_b', maxsize: 512)
03/03/2019 22:20:25 MainProcess training_0 queue_manager add_queue DEBUG QueueManager added: (name: 'train_b')
03/03/2019 22:20:25 MainProcess training_0 multithreading __init__ DEBUG Initializing MultiThread: (target: 'load_batches', thread_count: 1)
03/03/2019 22:20:25 MainProcess training_0 multithreading __init__ DEBUG Initialized MultiThread: 'load_batches'
03/03/2019 22:20:25 MainProcess training_0 multithreading start DEBUG Starting thread(s): 'load_batches'
03/03/2019 22:20:25 MainProcess training_0 multithreading start DEBUG Starting th File "/home/faceswap/scripts/train.py", line 97, in process
self.end_thread(thread, err)
File "/home/faceswap/scripts/train.py", line 122, in end_thread
thread.join()
File "/home/faceswap/lib/multithreading.py", line 179, in join
raise thread.err[1].with_traceback(thread.err[2])
File "/home/faceswap/lib/multithreading.py", line 117, in run
self._target(*self._args, **self._kwargs)
File "/home/faceswap/scripts/train.py", line 148, in training
raise err
File "/home/faceswap/scripts/train.py", line 138, in training
self.run_training_cycle(model, trainer)
File "/home/faceswap/scripts/train.py", line 210, in run_training_cycle
trainer.train_one_step(viewer, timelapse)
File "/home/faceswap/plugins/train/trainer/_base.py", line 149, in train_one_step
self.log_tensorboard(side, side_loss)
File "/home/faceswap/plugins/train/trainer/_base.py", line 172, in log_tensorboard
self.tensorboard[side].on_batch_end(self.model.state.iterations, logs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/callbacks_v1.py", line 362, in on_batch_end
profiler.start()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/eager/profiler.py", line 70, in start
raise ProfilerAlreadyRunningError('Another profiler is running.')
tensorflow.python.eager.profiler.ProfilerAlreadyRunningError: Another profiler is running.
PyWavelets==1.0.2
pyxdg==0.25
PyYAML==3.11
pyzmq==18.0.0
qtconsole==4.4.3
requests==2.9.1
scikit-image==0.14.2
scikit-learn==0.20.3
scipy==1.2.1
screen-resolution-extra==0.0.0
Send2Trash==1.5.0
six==1.12.0
ssh-import-id==5.5
sympy==1.3
system-service==0.3
tensorboard==1.13.0
tensorflow==1.13.1
tensorflow-estimator==1.13.0
tensorflow-gpu==1.13.1
termcolor==1.1.0
terminado==0.8.1
testpath==0.4.2
toolz==0.9.0
tornado==5.1.1
tqdm==4.31.1
traitlets==4.3.2
ufw==0.35
unattended-upgrades==0.1
urllib3==1.13.1
virtualenv==16.4.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.14.1
widgetsnbextension==3.4.2
xkit==0.0.0(venv)
Please, point me out what I'm doing wrong with setup. Should I use another tensorflow version here?
If it makes any value, before getting error I see such logs:
03/03/2019 22:20:20 INFO Log level set to: INFO
Using TensorFlow backend.
03/03/2019 22:20:22 INFO Model A Directory: /home/faceswap/photo/solo
03/03/2019 22:20:22 INFO Model B Directory: /home/faceswap/photo/ford
03/03/2019 22:20:22 INFO Training data directory: /home/faceswap/photo/models
03/03/2019 22:20:22 INFO ===============================================
03/03/2019 22:20:22 INFO - Starting -
03/03/2019 22:20:22 INFO - Press 'ENTER' to save and quit -
03/03/2019 22:20:22 INFO - Press 'S' to save model weights immediately -
03/03/2019 22:20:22 INFO ===============================================
03/03/2019 22:20:23 INFO Loading data, this may take a while...
03/03/2019 22:20:23 INFO Loading Model from Original plugin...
03/03/2019 22:20:24 INFO Loading config: '/home/faceswap/config/train.ini'
03/03/2019 22:20:24 WARNING No existing state file found. Generating.
03/03/2019 22:20:25 WARNING Failed loading existing training data. Generating new models
03/03/2019 22:20:25 INFO Loading Trainer from Original plugin...
03/03/2019 22:20:25 INFO Enabled TensorBoard Logging
03/03/2019 22:20:44 CRITICAL Error caught! Exiting...
03/03/2019 22:20:44 ERROR Caught exception in thread: 'training_0'
03/03/2019 22:20:46 ERROR Got Exception on main handler:
Traceback (most recent call last):
Thanks to anyone trying to help!
Did you stop bazel before starting to train?
# bazel shutdown
I did the same thing than you, compiling tensorflow from source, and I had this message. It didn't come back once I stopped the bazel server.
Same issue, any solutions to solve this error? Thanks!
Same issue, any solutions to solve this error? Thanks!
I have not found a solution. bazel shutdown hasn't help me.
I gave up and switched to docker image of this project.
Without a full crash_report we cannot help diagnose these issues.
I have the same problem. Extract works. Train doesn't. Logs have been attached. Thanks for your help. crash_report.2019.06.17.081214571325.log
Tensorflow 1.14 not tested nor supported. Downgrade.
Also here's the faceswap.log. Thanks. faceswap.log
Eager Profiler is a TF 1.14 feature. It is not supported. Downgrade.
Back down TF to 1.13.0. Works now. Thanks for the fast reply!