faceswap-playground icon indicating copy to clipboard operation
faceswap-playground copied to clipboard

Error when train a model "tensorflow.python.eager.profiler.ProfilerAlreadyRunningError: Another profiler is running"

Open VicGrygorchyk opened this issue 6 years ago • 9 comments

Hi! When I try a comman python3 faceswap.py train -A ./photo/fst -B ./photo/snd -m ./photo/models/ I got the problem (log below). Might worth to mention, I complied tensorflow myself, as when I used pip install tensorflow I got tensorflow not found error. The command extract works without problem. Else, I had error can't find module named 'numpy.core._multiarray_umath, as mentioned is this issue https://github.com/deepfakes/faceswap-playground/issues/261, so I updated numpy via pip install (But there is a comment that numpy 16 is broken, I have 1.16.2 ).

crash_report.2019.03.03.222044908664.log 
03/03/2019 22:20:25 MainProcess     training_0      _base           load_generator            DEBUG    Loading generator: b
03/03/2019 22:20:25 MainProcess     training_0      _base           load_generator            DEBUG    input_size: 64, output_size: 64
03/03/2019 22:20:25 MainProcess     training_0      training_data   __init__                  DEBUG    Initializing TrainingDataGenerator: (model_input_size: 64, model_output_shape: 64, training_opts: {'alignments': {'a': '/home/faceswap/photo/fst/alignments.json', 'b': '/home/faceswap/photo/snd/alignments.json'}, 'preview_scaling': 1.0, 'no_flip': False, 'preview_images': 14, 'training_size': 256, 'coverage_ratio': 0.625, 'mask_type': None, 'warp_to_landmarks': False, 'no_logs': False}, landmarks: False)
03/03/2019 22:20:25 MainProcess     training_0      training_data   set_mask_function         DEBUG    Mask function: None
03/03/2019 22:20:25 MainProcess     training_0      training_data   __init__                  DEBUG    Initializing ImageManipulation: (input_size: 64, output_size: 64, coverage_ratio: 0.625)
03/03/2019 22:20:25 MainProcess     training_0      training_data   __init__                  DEBUG    Initialized ImageManipulation
03/03/2019 22:20:25 MainProcess     training_0      training_data   __init__                  DEBUG    Initialized TrainingDataGenerator
03/03/2019 22:20:25 MainProcess     training_0      training_data   minibatch_ab              DEBUG    Queue batches: (image_count: 960, batchsize: 64, side: 'b', do_shuffle: True, is_timelapse: False)
03/03/2019 22:20:25 MainProcess     training_0      queue_manager   add_queue                 DEBUG    QueueManager adding: (name: 'train_b', maxsize: 512)
03/03/2019 22:20:25 MainProcess     training_0      queue_manager   add_queue                 DEBUG    QueueManager added: (name: 'train_b')
03/03/2019 22:20:25 MainProcess     training_0      multithreading  __init__                  DEBUG    Initializing MultiThread: (target: 'load_batches', thread_count: 1)
03/03/2019 22:20:25 MainProcess     training_0      multithreading  __init__                  DEBUG    Initialized MultiThread: 'load_batches'
03/03/2019 22:20:25 MainProcess     training_0      multithreading  start                     DEBUG    Starting thread(s): 'load_batches'
03/03/2019 22:20:25 MainProcess     training_0      multithreading  start                     DEBUG    Starting th  File "/home/faceswap/scripts/train.py", line 97, in process
    self.end_thread(thread, err)
  File "/home/faceswap/scripts/train.py", line 122, in end_thread
    thread.join()
  File "/home/faceswap/lib/multithreading.py", line 179, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "/home/faceswap/lib/multithreading.py", line 117, in run
    self._target(*self._args, **self._kwargs)
  File "/home/faceswap/scripts/train.py", line 148, in training
    raise err
  File "/home/faceswap/scripts/train.py", line 138, in training
    self.run_training_cycle(model, trainer)
  File "/home/faceswap/scripts/train.py", line 210, in run_training_cycle
    trainer.train_one_step(viewer, timelapse)
  File "/home/faceswap/plugins/train/trainer/_base.py", line 149, in train_one_step
    self.log_tensorboard(side, side_loss)
  File "/home/faceswap/plugins/train/trainer/_base.py", line 172, in log_tensorboard
    self.tensorboard[side].on_batch_end(self.model.state.iterations, logs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/callbacks_v1.py", line 362, in on_batch_end
    profiler.start()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/eager/profiler.py", line 70, in start
    raise ProfilerAlreadyRunningError('Another profiler is running.')
tensorflow.python.eager.profiler.ProfilerAlreadyRunningError: Another profiler is running.
PyWavelets==1.0.2
pyxdg==0.25
PyYAML==3.11
pyzmq==18.0.0
qtconsole==4.4.3
requests==2.9.1
scikit-image==0.14.2
scikit-learn==0.20.3
scipy==1.2.1
screen-resolution-extra==0.0.0
Send2Trash==1.5.0
six==1.12.0
ssh-import-id==5.5
sympy==1.3
system-service==0.3
tensorboard==1.13.0
tensorflow==1.13.1
tensorflow-estimator==1.13.0
tensorflow-gpu==1.13.1
termcolor==1.1.0
terminado==0.8.1
testpath==0.4.2
toolz==0.9.0
tornado==5.1.1
tqdm==4.31.1
traitlets==4.3.2
ufw==0.35
unattended-upgrades==0.1
urllib3==1.13.1
virtualenv==16.4.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.14.1
widgetsnbextension==3.4.2
xkit==0.0.0(venv) 

Please, point me out what I'm doing wrong with setup. Should I use another tensorflow version here? If it makes any value, before getting error I see such logs:

03/03/2019 22:20:20 INFO     Log level set to: INFO
Using TensorFlow backend.
03/03/2019 22:20:22 INFO     Model A Directory: /home/faceswap/photo/solo
03/03/2019 22:20:22 INFO     Model B Directory: /home/faceswap/photo/ford
03/03/2019 22:20:22 INFO     Training data directory: /home/faceswap/photo/models
03/03/2019 22:20:22 INFO     ===============================================
03/03/2019 22:20:22 INFO     - Starting                                    -
03/03/2019 22:20:22 INFO     - Press 'ENTER' to save and quit              -
03/03/2019 22:20:22 INFO     - Press 'S' to save model weights immediately -
03/03/2019 22:20:22 INFO     ===============================================
03/03/2019 22:20:23 INFO     Loading data, this may take a while...
03/03/2019 22:20:23 INFO     Loading Model from Original plugin...
03/03/2019 22:20:24 INFO     Loading config: '/home/faceswap/config/train.ini'
03/03/2019 22:20:24 WARNING  No existing state file found. Generating.
03/03/2019 22:20:25 WARNING  Failed loading existing training data. Generating new models
03/03/2019 22:20:25 INFO     Loading Trainer from Original plugin...
03/03/2019 22:20:25 INFO     Enabled TensorBoard Logging
03/03/2019 22:20:44 CRITICAL Error caught! Exiting...
03/03/2019 22:20:44 ERROR    Caught exception in thread: 'training_0'
03/03/2019 22:20:46 ERROR    Got Exception on main handler:
Traceback (most recent call last):

Thanks to anyone trying to help!

VicGrygorchyk avatar Mar 03 '19 22:03 VicGrygorchyk

Did you stop bazel before starting to train?

# bazel shutdown

I did the same thing than you, compiling tensorflow from source, and I had this message. It didn't come back once I stopped the bazel server.

Kirin-kun avatar Mar 04 '19 09:03 Kirin-kun

Same issue, any solutions to solve this error? Thanks!

leinine avatar Mar 18 '19 06:03 leinine

Same issue, any solutions to solve this error? Thanks!

I have not found a solution. bazel shutdown hasn't help me. I gave up and switched to docker image of this project.

VicGrygorchyk avatar Mar 18 '19 08:03 VicGrygorchyk

Without a full crash_report we cannot help diagnose these issues.

torzdf avatar Mar 18 '19 09:03 torzdf

I have the same problem. Extract works. Train doesn't. Logs have been attached. Thanks for your help. crash_report.2019.06.17.081214571325.log

seven110 avatar Jun 17 '19 15:06 seven110

Tensorflow 1.14 not tested nor supported. Downgrade.

torzdf avatar Jun 17 '19 15:06 torzdf

Also here's the faceswap.log. Thanks. faceswap.log

seven110 avatar Jun 17 '19 15:06 seven110

Eager Profiler is a TF 1.14 feature. It is not supported. Downgrade.

torzdf avatar Jun 17 '19 15:06 torzdf

Back down TF to 1.13.0. Works now. Thanks for the fast reply!

seven110 avatar Jun 17 '19 15:06 seven110