deep-voice-conversion icon indicating copy to clipboard operation
deep-voice-conversion copied to clipboard

I can't run it in window 10, could someone help me ?

Open flyuuo9 opened this issue 7 years ago • 82 comments

My env is win10 + anaconda2 + python3.5. It's my first time to use tensorflow. The log below looks like something went wrong when parse hparams/default.yaml. I even have tried changed default.yaml the CF to window's CRLF. Cound someone help me ?

(python35) λ pip show pyyaml
Name: PyYAML
Version: 3.13
Summary: YAML parser and emitter for Python
Home-page: http://pyyaml.org/wiki/PyYAML
Author: Kirill Simonov
Author-email: [email protected]
License: MIT

(python35) λ pip show tensorflow
Name: tensorflow
Version: 1.9.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: [email protected]
License: Apache 2.0
D:\proj_github\deep-voice-conversion (master -> origin)
(python35) λ python train1.py case
case: case, logdir: /data/private/vc/logdir/case/train1
[0725 16:52:49 @logger.py:109] WRN Log directory /data/private/vc/logdir/case/train1 exists! Use 'd' to delete it.
[0725 16:52:49 @logger.py:112] WRN If you're resuming from a previous run, you can choose to keep it.
Press any other key to exit.
Select Action: k (keep) / d (delete) / q (quit):d
[0725 16:52:52 @logger.py:74] Argv: train1.py case
[0725 16:52:52 @parallel.py:175] WRN MultiProcessPrefetchData does support windows. However, windows requires more strict picklability on processes, which may lead of failure on some of the code.
[0725 16:52:52 @parallel.py:185] [MultiProcessPrefetchData] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
Process _Worker-1:
Traceback (most recent call last):
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\multiprocessing\process.py", line 252, in _bootstrap
    self.run()
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\dataflow\parallel.py", line 162, in run
    for dp in self.ds.get_data():
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\dataflow\common.py", line 116, in get_data
    for data in self.ds.get_data():
  File "D:\proj_github\deep-voice-conversion\data_load.py", line 35, in get_data
    yield get_mfccs_and_phones(wav_file=wav_file)
  File "D:\proj_github\deep-voice-conversion\data_load.py", line 72, in get_mfccs_and_phones
    wav = read_wav(wav_file, sr=hp.default.sr)
KeyError: 'default'
Process _Worker-2:
Traceback (most recent call last):
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\multiprocessing\process.py", line 252, in _bootstrap
    self.run()
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\dataflow\parallel.py", line 162, in run
    for dp in self.ds.get_data():
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\dataflow\common.py", line 116, in get_data
    for data in self.ds.get_data():
  File "D:\proj_github\deep-voice-conversion\data_load.py", line 35, in get_data
    yield get_mfccs_and_phones(wav_file=wav_file)
  File "D:\proj_github\deep-voice-conversion\data_load.py", line 72, in get_mfccs_and_phones
    wav = read_wav(wav_file, sr=hp.default.sr)
KeyError: 'default'

[0725 16:52:31 @training.py:101] Building graph for training tower 1 on device /gpu:1 ...
[0725 16:52:34 @collection.py:164] These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 3->5)
Traceback (most recent call last):
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1589, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Op type not registered 'NcclAllReduce' in binary running on mywind-PC. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. while building NodeDef 'AllReduceGrads/NcclAllReduce'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/proj_github/deep-voice-conversion/train1.py", line 78, in <module>
    train(args, logdir=logdir_train1)
  File "D:/proj_github/deep-voice-conversion/train1.py", line 60, in train
    launch_train_with_config(train_conf, trainer=trainer)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\train\interface.py", line 81, in launch_train_with_config
    model._build_graph_get_cost, model.get_optimizer)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\utils\argtools.py", line 181, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\train\tower.py", line 173, in setup_graph
    train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\train\trainers.py", line 166, in _setup_graph
    self._make_get_grad_fn(input, get_cost_fn, get_opt_fn), get_opt_fn)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\graph_builder\training.py", line 232, in build
    all_grads = allreduce_grads(all_grads, average=self._average)  # #gpu x #param
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\tfutils\scope_utils.py", line 84, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorpack\graph_builder\utils.py", line 140, in allreduce_grads
    summed = nccl.all_sum(grads)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\contrib\nccl\python\ops\nccl_ops.py", line 47, in all_sum
    return _apply_all_reduce('sum', tensors)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\contrib\nccl\python\ops\nccl_ops.py", line 228, in _apply_all_reduce
    shared_name=shared_name))
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\contrib\nccl\ops\gen_nccl_ops.py", line 58, in nccl_all_reduce
    num_devices=num_devices, shared_name=shared_name, name=name)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op
    op_def=op_def)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1756, in __init__
    control_input_ops)
  File "C:\Users\mywind\AppData\Local\conda\conda\envs\python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1592, in _create_c_op
    raise ValueError(str(e))
ValueError: Op type not registered 'NcclAllReduce' in binary running on mywind-PC. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. while building NodeDef 'AllReduceGrads/NcclAllReduce'

Process finished with exit code 1

flyuuo9 avatar Jul 25 '18 08:07 flyuuo9

Excuse me,do you solve this problem? I have the same problem. Traceback (most recent call last): File "C:\Users\blcdec\ApplicationInstallPlace\Anaconda3\envs\tensorflow\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\Users\blcdec\ApplicationInstallPlace\Anaconda3\envs\tensorflow\lib\site-packages\tensorpack\dataflow\parallel.py", line 162, in run for dp in self.ds.get_data(): File "C:\Users\blcdec\ApplicationInstallPlace\Anaconda3\envs\tensorflow\lib\site-packages\tensorpack\dataflow\common.py", line 116, in get_data for data in self.ds.get_data(): File "C:\Users\blcdec\project\deep-voice-conversion-master\deep-voice-conversion-master\data_load.py", line 35, in get_data yield get_mfccs_and_phones(wav_file=wav_file) File "C:\Users\blcdec\project\deep-voice-conversion-master\deep-voice-conversion-master\data_load.py", line 72, in get_mfccs_and_phones wav = read_wav(wav_file, sr=hp.default.sr) KeyError: 'default'

BlcDec avatar Aug 10 '18 12:08 BlcDec

@bhui I'm having the same problem on Windows 10.

ValueError: Op type not registered 'NcclAllReduce' in binary running on DESK. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. while building NodeDef 'AllReduceGrads/NcclAllReduce'

I've now tried at least 8 different repos for trying to learn voice cloning, and none of them have good enough documentation for me to get them working. I'm super inspired by all of the examples but haven't had much luck.

ryancwalsh avatar Aug 15 '18 16:08 ryancwalsh

Same problem here. @bhui Have you solved this problem? Or could anyone please provide a solution? Thanks a lot!

dawningo avatar Aug 19 '18 08:08 dawningo

I have been unable to find a solution, but after thorough troubleshooting I have found the problem. The project relies on nccl, which is not supported in Windows. I don't know enough of Python or Tensorflow (new to both) to know how to edit the code and exclude calls to nccl, so my "solution" was to dual-boot linux(Ubuntu) on my system.

Unfortunately, it seems that unless nVidia releases nccl for Windows or major changes are made to the code for this project, it can only be run on a nccl-supported linux system.

CIDFarwin avatar Aug 23 '18 03:08 CIDFarwin

I'm running on Windows on a single GPU, you should migrate all the code that uses hparam.py, I changed all the code to use hparams.py, in most of the code you just have to change from default to Default, there is missing properties in Default and TrainX in hparams.py so, copy and paste the properties from hparam.py and replace the : for =

Nccl reduce may be caused by leaking wav files or the dataset path is incorrect, verify in the hparams.py, the other cause of ncclreduce is to use more than 1 GPU on windows.

My hparams.py, hope it helps. hparams.zip

carlfm01 avatar Sep 12 '18 01:09 carlfm01

Well I tried on Windows 7 with your hparams and seems like it works, but now I'm getting a bunch of encoding errors, and It seems like it's not finding the dataset properly. (it's also saving to the wrong logdir.) Did you have this problem? if so, how did you fix it?

CIDFarwin avatar Sep 13 '18 20:09 CIDFarwin

encoding error

CIDFarwin avatar Sep 13 '18 20:09 CIDFarwin

Yes, as you said this error is related to the path of the dataset, I'm new to python and TF so I created a litle project using glob.glob to try load wav files, I discover that I have to delete the / at the start of the path. Here is one example of my path. pythontest.zip

carlfm01 avatar Sep 13 '18 22:09 carlfm01

With the logdir I have to stop using the case, now I'm fixing that to use the case names from the console params.

carlfm01 avatar Sep 13 '18 22:09 carlfm01

@CIDFarwin you got output with python 3.5 on windows?

carlfm01 avatar Sep 20 '18 23:09 carlfm01

I'm still getting the same errors. I've used your test script and I'm finding the files, and I'm using the same datapath, so I don't know what's going on. I'm wondering if it is some sort of encoding problem, but I don't know why nobody else seems to have that problem (and it works on my linux build)

CIDFarwin avatar Sep 21 '18 06:09 CIDFarwin

You installed https://www.ffmpeg.org/? Let me know. I'm using the latest version but looks like linux somewhere is generating different arrays 😞

carlfm01 avatar Sep 22 '18 09:09 carlfm01

@CIDFarwin I cloned again the master branch and I was getting this error, try by reaplacing data_load.py on line 81 for phn_file = wav_file.replace("WAV", "PHN").replace("wav", "PHN") I hope you fix it.

carlfm01 avatar Sep 24 '18 05:09 carlfm01

Well, finally woking with Python 3.6 on Windows

individualAudio1.zip

carlfm01 avatar Sep 25 '18 18:09 carlfm01

Huh,

I'm sure I tried that already, but that seems to have fixed it. I'll let it run for a bit and let you know how my output looks.

Thanks a bunch!

CIDFarwin avatar Sep 27 '18 17:09 CIDFarwin

@carlfm01 I'm very encouraged but still very confused about your comment about hparams.py. https://github.com/andabi/deep-voice-conversion/issues/52#issuecomment-420484728

I'm on Windows 10.

I see these files (among others, of course):

  • deep-voice-conversion\hparams\default.yaml
  • deep-voice-conversion\hparams\hparams.yaml
  • deep-voice-conversion\hparam.py

I see that you shared a file called hparams.py, but I'm not sure where to save it.

If you wouldn't mind, I'd love if you could clarify each step that you wrote here:

you should migrate all the code that uses hparam.py, I changed all the code to use hparams.py, in most of the code you just have to change from default to Default, there is missing properties in Default and TrainX in hparams.py so, copy and paste the properties from hparam.py and replace the : for =

Here is my guess about what you were saying:

  1. Download my attached hparams.py file to your project folder.
  2. Throughout the project, change any code that mentions "hparam.py" to instead say "hparams.py". [Which files?]
  3. [I don't understand the rest]

cc @CIDFarwin

Thanks!

ryancwalsh avatar Sep 29 '18 16:09 ryancwalsh

@ryancwalsh Take a look https://github.com/carlfm01/deep-voice-conversion, also I changed the code of the lambda in convert to make it work on python 3.+, let me know if you still don't undertand something.

carlfm01 avatar Sep 29 '18 18:09 carlfm01

I can confirm that it is working on windows. I have an ouptut, but not very good.

I have another problem now, as it seems tensorflow is only running on my CPU and not my GPU, and training is taking a very long time. (much longer than on my linux build.) I know the scripts are finding my GPU because I see the line "Created TensorFlow device" and my GPU listed, with compute capabliity 5.2 image

CIDFarwin avatar Oct 03 '18 19:10 CIDFarwin

aaaaaaaaaaaaaaaaaa Hi @CIDFarwin You can see the GPU usage using the nvidia-smi C:\Program Files\NVIDIA Corporation\NVSMI, check the usage with this tools and let me know, also check if in the params you set gpu to 1, I'm with the same problem on Linux, on my Windows machine the GPU always is on 40%-99% usage, but in Linux is always 0%-15%, in the Linux build you have to use allow_soft_placement? For me in Linux just creating the session and creating the session takes like 5min or more, (same hardware). I tried a lot of config changes but no luck yet.

carlfm01 avatar Oct 03 '18 20:10 carlfm01

@carlfm01

@ryancwalsh Take a look https://github.com/carlfm01/deep-voice-conversion, also I changed the code of the lambda in convert to make it work on python 3.+, let me know if you still don't undertand something.

I run train1.py use the code, but train2.py always raised MemoryError. My machine has 16GB memory. The memory usage is below 90% in the monitor view.

[32m[1114 00:02:55 @base.py:227][0m Creating the session ...
2018-11-14 00:02:55.852710: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-11-14 00:02:56.056995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2018-11-14 00:02:56.063291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-14 00:02:56.971796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-14 00:02:56.982418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2018-11-14 00:02:56.985385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2018-11-14 00:02:56.988684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8788 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
[32m[1114 00:02:59 @base.py:233][0m Initializing the session ...
[32m[1114 00:02:59 @sessinit.py:117][0m Restoring checkpoint from cases/None/train1\model-100 ...
[32m[1114 00:02:59 @base.py:240][0m Graph Finalized.
[32m[1114 00:02:59 @concurrency.py:37][0m Starting EnqueueThread QueueInput/input_queue ...
[32m[1114 00:02:59 @graph.py:73][0m Running Op sync_variables/sync_variables_from_main_tower ...
[32m[1114 00:03:01 @base.py:272][0m Start Epoch 1 ...
  0%|                                                                                             |0/100[00:00<?,?it/s]2018-11-14 00:03:10.004190: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
[32m[1114 00:03:10 @input_source.py:168][0m [4m[5m[31mERR[0m Exception in EnqueueThread QueueInput/input_queue:
Traceback (most recent call last):
  File "G:\Anaconda\envs\tensorflow-gpu\lib\site-packages\tensorpack\input_source\input_source.py", line 158, in run
    dp = next(self._itr)
  File "G:\Anaconda\envs\tensorflow-gpu\lib\site-packages\tensorpack\dataflow\common.py", line 355, in __iter__
    for dp in self.ds:
  File "G:\Anaconda\envs\tensorflow-gpu\lib\site-packages\tensorpack\dataflow\parallel.py", line 199, in __iter__
    dp = self.queue.get()
  File "G:\Anaconda\envs\tensorflow-gpu\lib\multiprocessing\queues.py", line 94, in get
    res = self._recv_bytes()
  File "G:\Anaconda\envs\tensorflow-gpu\lib\multiprocessing\connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "G:\Anaconda\envs\tensorflow-gpu\lib\multiprocessing\connection.py", line 318, in _recv_bytes
    return self._get_more_data(ov, maxsize)
  File "G:\Anaconda\envs\tensorflow-gpu\lib\multiprocessing\connection.py", line 340, in _get_more_data
    ov, err = _winapi.ReadFile(self._handle, left, overlapped=True)
MemoryError
2018-11-14 00:03:10.166847: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.80G (1932735232 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2018-11-14 00:03:10.296870: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.62G (1739461632 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
[32m[1114 00:03:10 @common.py:147][0m [4m[5m[31mERR[0m Cannot batch data. Perhaps they are of inconsistent shape?
Traceback (most recent call last):
  File "G:\Anaconda\envs\tensorflow-gpu\lib\site-packages\tensorpack\dataflow\common.py", line 145, in _aggregate_batch
    np.asarray([x[k] for x in data_holder], dtype=tp))
  File "C:\Users\Mloong\AppData\Roaming\Python\Python36\site-packages\numpy\core\numeric.py", line 501, in asarray
    return array(a, dtype, copy=False, order=order)
MemoryError
[32m[1114 00:03:10 @common.py:150][0m [4m[5m[31mERR[0m Shape of all arrays to be batched: [(334, 90),
 (334, 90),
 (334, 90),

flyuuo9 avatar Nov 13 '18 16:11 flyuuo9

Hi @bhui looks like the error is related to incorrect format or corrupted files, make sure your training data for the second network is at 16000 sampling rate, mono and wave format. Also try with different versions of numpy, I recommend you to use conda enviroments :).

carlfm01 avatar Nov 13 '18 19:11 carlfm01

@carlfm01 Thank u! It's finnally solved by add memory from 16GB to 32 GB. But I have a new question, I run cmd python convert.py, and no wav file is generate. How can I view the result of convert.py?
I has checked the file convert.py, the file should be gen into cases/None/train2. But I cannot found outfiles.

flyuuo9 avatar Nov 15 '18 10:11 flyuuo9

To see the result you have to use tesorboard go to "cases/None" directory and in the command line type tensorboard --logdir=train2 Then you have to open the url that the console is printing, when you open the url there's a tab for audio, now you should see the generated audios

carlfm01 avatar Nov 15 '18 15:11 carlfm01

@carlfm01 Thanks! You helped me a lot. I am a beginner, just getting into tensorflow.

flyuuo9 avatar Nov 16 '18 07:11 flyuuo9

@carlfm01 w q m4 go jbn8d36y41 7 I got the problem and still have no idea,is there somthing wrong with the data_load.py?

jiyuay avatar Dec 05 '18 09:12 jiyuay

1544003913 1 It's like there is something wrong with line 33 in data_load.py

jiyuay avatar Dec 05 '18 10:12 jiyuay

@wuzhiyu666 Maybe incorrect path, share an example to one of your train1 wav files, and also the path that you are using in the code.

carlfm01 avatar Dec 05 '18 16:12 carlfm01

@carlfm01 the wav files is in C:\Users\SANDSTORM\Desktop\deep-voice-conversion-master\deep-voice-conversion-master\datasets\arctic\bdl

and the code C:\Users\SANDSTORM\Desktop\deep-voice-conversion-master\deep-voice-conversion-master

is that what you mean? I am a fresher

jiyuay avatar Dec 06 '18 00:12 jiyuay

@wuzhiyu666 sorry, I meant for the timit data, that that you commented is for the second net.

carlfm01 avatar Dec 06 '18 00:12 carlfm01

sorry ,I have not downloaded the timit data yet,I will download it then tell you,Thank you very much!

jiyuay avatar Dec 06 '18 01:12 jiyuay