deep-voice-conversion icon indicating copy to clipboard operation
deep-voice-conversion copied to clipboard

run problem with docker?

Open Xiyor opened this issue 7 years ago • 24 comments

hi, I am interested in audio style transfer. I set up a docker container(run a tensorflow-gpu image) in a host(hardware: 32g memory+1080ti). But, according to your tutorial, I run the command: [python train1.py default] (default is a name I take randomly). it is running without stoping, forever stuck in period of epoch=1 . So , what is wrong with my operation? Looking forward to your answer. Thanks.

Xiyor avatar Nov 15 '17 11:11 Xiyor

@Xiyor I think the queue runner is not working properly. To debug, set queue=False in train1.py and then run and see what message is up.

andabi avatar Nov 16 '17 09:11 andabi

Thank you for your reply. Last cause of problem is: I did not add TIMIT dataset under dir:datasets. when I add TIMIT dataset, I found the filename is strange, has two different extension: phn.txt and phn. So I modify the code in dota_load.py. However, when I set queue=False, the procedure is still running without stop. I am confused what happened.

Xiyor avatar Nov 18 '17 10:11 Xiyor

I have the same issue on Linux and mac, any updates ?

boussaffawalid avatar Nov 20 '17 21:11 boussaffawalid

@boussaffawalid I have not make it run successfully. decide to research tensorflow.

Xiyor avatar Nov 21 '17 14:11 Xiyor

@Xiyor @boussaffawalid Please check the paths(data_path or something) in hparam.py again. If you set the paths incorrectly, the problem you mentioned could happen.

andabi avatar Nov 22 '17 08:11 andabi

@andabi @boussaffawalid andabi is right. I checked TIMIT dataset, some wavfiles have no related phn file or some phn files have no related wavfiles. I write a script to find these outlines and run train1.py, it suceessfully run. Thanks.

Xiyor avatar Nov 22 '17 09:11 Xiyor

I updated the data_path and now I have another issue: Below you can find the log. I tested this with python3 on Mac and Windows.

Traceback (most recent call last): File "train1.py", line 91, in train(logdir=logdir) File "train1.py", line 65, in train summ, gs = sess.run([summ_op, global_step]) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0) [[Node: batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, batch/n)]]

boussaffawalid avatar Nov 23 '17 19:11 boussaffawalid

@boussaffawalid did you set queue=false? This method can debugs some error where data preprocess. when data preprocess come with problems, the queue will close. but I am not certain your problem, you can try it. In my case, my TIMIT dataset has some outlines, had similar problems.

Xiyor avatar Nov 24 '17 02:11 Xiyor

@andabi still hava problem when switch to tensorflow-gpu docker container. same code could run in tensorflow-cpu container, but failed in gpu container. It hang forever !!. I set queue=false, the problem also occur. Then, I set num_thread=2, add some log code, found a threads exec the step: librosa.load(wav_file, sr=sr) and could not go further, hang here. I could not figure out the problem. could you help me please? Thanks.

Xiyor avatar Nov 24 '17 03:11 Xiyor

@Xiyor May I ask your setting to dataset path?I set my path as in hparams.py like this: data_path_base = '/home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/datasets' logdir_path = '/home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/logdir'

And when I run, I got len of wavfile equals to 0, So I tried print the data path the script search for, and I got: /home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/datasets/timit/TIMIT/TRAIN/*/*/*.wav

And my path is : image

Is it a currect path? What was yours? Thank you so much.

HudsonHuang avatar Nov 24 '17 03:11 HudsonHuang

@HudsonHuang
hello. I feel your path is right. but, I see your TIMIT dataset is strange, your wav file's extension is WAV, not wav, you can modify the code to 'wav', you can try it, wish you. Thanks.

Xiyor avatar Nov 24 '17 04:11 Xiyor

@Xiyor if you are hanging on a librosa.load() maybe you need an audio backend (make sure ffmpeg is installed).

SriramS32 avatar Nov 24 '17 20:11 SriramS32

I updated the data path and added few log messages to make sure that the wav files are loaded correctly. Now it crash on the first epoch with the error bellow, any proposition ?

Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call return fn(*args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn status, run_metadata) File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0) [[Node: batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, batch/n)]]

During handling of the above exception, another exception occurred:

boussaffawalid avatar Nov 24 '17 20:11 boussaffawalid

@boussaffawalid firstly, the code was written in python3 seemly. and set queue=false is a proper way to verify the preprocess is no bug.

Xiyor avatar Nov 27 '17 09:11 Xiyor

@SriramS32 thank you for suggestion. ffmpeg has been installed. guess not this problem.

Xiyor avatar Nov 27 '17 09:11 Xiyor

@Xiyor Im using python3, I tried with queue=False and I got another error. Is it maybe because of something wrong in the database I'm using! Im using this database http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

Traceback (most recent call last): File "train1.py", line 91, in train(logdir=logdir) File "train1.py", line 61, in train mfcc, ppg = get_batch(model.mode, model.batch_size) File "C:\dev\deep-voice-conversion-master\data_load.py", line 276, in get_batch target_wavs = sample(wav_files, batch_size) File "C:\Users\boussafaw\AppData\Local\Programs\Python\Python36\lib\random.py", line 317, in sample raise ValueError("Sample larger than population or is negative") ValueError: Sample larger than population or is negative

boussaffawalid avatar Nov 27 '17 12:11 boussaffawalid

@boussaffawalid sorry, I just pass a wrong infomation to you, code is written in python2. The error message indicate that len(wav_files) less than batch_size, you could print some log. Directly set queue=false, code seemly run failed, you need to understand the process and modify some code. or you can @andabi .

Xiyor avatar Nov 27 '17 12:11 Xiyor

how to download the TIMIT datasets?

zuoshaobo avatar Dec 19 '17 08:12 zuoshaobo

@zuoshaobo TIMIT is not free and the full version costs 250$. You can get it at https://catalog.ldc.upenn.edu/ldc93s1 Or you can borrow it from a friend...

pmsinner avatar Dec 22 '17 19:12 pmsinner

@boussaffawalid Just in case, and to help anyone who tries this in the future...did you update both the TRAIN and TEST folders in the TIMIT dataset? I only corrected the TRAIN folder, but then I got the error you see because I didn't fix the .WAV to .wav in the TEST folder...hope that helps you, or someone!

jswilson avatar Jan 05 '18 16:01 jswilson

@jswilson @zuoshaobo @Xiyor I did some change, fixes for the issues we discussed above: fixing paths, upgrading to python3, using parameters from command line. I also added a megalink for downloading the database.

In case anyone is interested please check this fork: https://github.com/boussaffawalid/deep-voice-conversion

boussaffawalid avatar Jan 05 '18 22:01 boussaffawalid

@boussaffawalid Thank you for your code, I face the problem " raise ValueError("Sample larger than population or is negative")" What it means, should I change batch_size or anything else?

this is my errors,I am glad to waiting for your answer....

target_wavs = sample(wav_files, batch_size) File "G:\anaconda\lib\random.py", line 317, in sample eval1.eval(logdir=logdir, hparams=hparams) File "G:\code\python\myfile\ASR\deep-voice-conversion-master\eval1.py", line 48, in eval mfcc, ppg = get_batch(model.mode, model.batch_size) File "G:\code\python\myfile\ASR\deep-voice-conversion-master\data_load.py", line 203, in get_batch execfile(filename, namespace) File "G:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "G:/code/python/myfile/ASR/deep-voice-conversion-master/train1.py", line 117, in train(logdir=logdir, hparams = hp) File "G:/code/python/myfile/ASR/deep-voice-conversion-master/train1.py", line 77, in train Traceback (most recent call last): File "", line 1, in File "G:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile

raise ValueError("Sample larger than population or is negative")

ValueError: Sample larger than population or is negative

Hjwjames avatar Mar 19 '18 07:03 Hjwjames

@boussaffawalid I meet the same problem , did you resolve it? Thank you.

2018-05-17 15:57:18.150601: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-17 15:57:18.150617: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-17 15:57:18.150620: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2018-05-17 15:57:18.150623: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-17 15:57:18.150625: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 0%| | 0/11 [00:00<?, ?b/s]Exception KeyError: KeyError(<weakref at 0x7f1226913260; to 'tqdm' at 0x7f1226afb0d0>,) in <bound method tqdm.del of 0%| | 0/11 [00:01<?, ?b/s]> ignored Traceback (most recent call last): File "/home/human-machine/Speech/deep-voice-conversion-master/train1.py", line 90, in train(logdir='./logdir/default/train1', queue=True) File "/home/human-machine/Speech/deep-voice-conversion-master/train1.py", line 57, in train sess.run(train_op) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0) [[Node: batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, batch/n)]]

tiankong-hut avatar May 17 '18 13:05 tiankong-hut

I have resolved my problem about "OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0)" , just install ffmpeg (A complete, cross-platform solution to record, convert and stream audio and video).

tiankong-hut avatar May 22 '18 02:05 tiankong-hut