tensorflow-wavenet
tensorflow-wavenet copied to clipboard
Does anyone has some pre-trained model which we can download it?
My computer is not very powerful and I don't have the resources available in my university, can someone or could the readme.md give us a link where we can download a pre-trained model?
@pfeodrippe please find attached: wavenet_models.zip
Training information: GPU used - Titan X last year's model (12 GB VRAM) Runtime - Not measured, but less than an hour. BATCH_SIZE = 1 Trained on - 'VCTK-Corpus' NUM_STEPS = 2000 LEARNING_RATE = 0.03 "filter_width": 2, "quantization_steps": 256, "sample_rate": 16000, "dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512], "residual_channels": 32, "dilation_channels":16
@adroit91 were you able to generate an output with this? if yes could you also share the generated file
No, same issue as you. Still trying to trace it.
@adroit91 Thanks, man! Now let's attack the generating problem
It would be nice to have these models and examples in a centralized place, like the repo wiki or a Releases page.
Yes, It'd be fine to have a centralized repository where to upload all examples.
How much was loss?
I have tried to compile the generated audio in a wiki: Generated audio samples wiki
We can observe how the training is converting gibberish into natural sounding voice over iterations.
Learning rate = 0.002 NUM_STEPS = 240000 -- But, GPU (Titan X 12GB) went out of memory at 36423 steps. SAMPLE_SIZE = 100000 Loss was around 2-2.1
@adroit91 Could you share the different models or at least the last one? Is it the same zip you sent before?
Great work! I'll listen to it after my morning classes
@lemonzi
It would be nice to have these models and examples in a centralized place, like the repo wiki or a Releases page.
Are you referring to just the model description, or do you mean the trained model's checkpoint file too? Github doesn't want to host large binary files (if I recall the TOS). I was thinking about getting a dropbox or some such account to put the ckpt file on, but something that anyone could contribute to would better. Is there some ready solution for this of which I'm ignorant?
@jyegerlehner We may post large binaries as 'releases' which github supports. :)
Thanks @nakosung that's news to me. I'm reading about github releases now :)
@jyegerlehner I meant both! Releases would be great for an "official" checkpoint with the default model settings and a decent number of iterations, but are not world-editable.
@adroit91 , I have limited computation resources, so I run this experiment with currently default config on only one person's waves (p225, about 231 waves). I have run experiment by 35K steps, but the loss still more than 2, and generated waves are noisy. Is there any issue about my experiment? Thanks very much.
@weixsong Try decreasing silence threshold to 0.1-0.2 or lower. Yet be careful, with too low value it may start generating mostly silence. Also decreasing learning rate by 1-2 orders of magnitude might help.
After lowering silence threshold your loss would drop to circa 2. I'm not sure, but it looks like quality of generated sounds and loss are not directly related. Waiting for 10-20k more steps might help even if loss stays on relatively same level.
@lemonzi, is this https://soundcloud.com/user-731806733/tensorflow-wavenet-500-msec-88k-train-steps wave generated by latest code that support global condition, people ID?,
@akademi4eg , thanks very much for your help. Know i could generated some sound that sounds make sense.
@weixsong Hi there! @jyegerlehner is the one to ask, he implemented Global Conditioning and submitted this sample ;)
@weixsong No I think that one was trained on only audio from the single speaker. That model preceded conditioning on speaker id.
@jyegerlehner , thanks very much. Do u have any wave that generated by model trained with latest code? I mean some wave generated with people ID as global condition.
I am a complete newbie here. Can there be an implementation of the wavenet trained model in C#/.NET ?
I am using System.Speech.Recognition module of C#, which is misinterpreting the speech provided with something else (Eg. Hello = Well oh )
@jyegerlehner did you find a way to upload the trained model's checkpoint?
@ebadawy sorry that model is long gone. At the time I was ultimately trying to get local conditioning on text characters (i.e. TTS) to work with and didn't succeed. Didn't think the GC stuff was very interesting by itself.
I don't remember it being difficult to train; the PR has some discussion and links to samples I generated for different speakers. If I recall correctly I had a machine with a Titan GPU training for a few weeks to produce those samples so shouldn't be too hard to reproduce that if someone really wanted to.
Also, aren't there some pretty good open source implementations of TTS models (e.g. tacotron?) out there now, so I'd guess this wavenet implementation is not very interesting to most.
First sounds with the current default settings! :)
I wonder what is the point of this sample? it sounds just like a noise.
In original demo page they provide samples with voice https://deepmind.com/blog/wavenet-generative-model-raw-audio/
In issue #307 @DiyuanLu has provided a pre-trained model for 12000 steps.