tensorflow-wavenet icon indicating copy to clipboard operation
tensorflow-wavenet copied to clipboard

Does anyone has some pre-trained model which we can download it?

Open pfeodrippe opened this issue 7 years ago • 26 comments

My computer is not very powerful and I don't have the resources available in my university, can someone or could the readme.md give us a link where we can download a pre-trained model?

pfeodrippe avatar Sep 16 '16 13:09 pfeodrippe

@pfeodrippe please find attached: wavenet_models.zip

Training information: GPU used - Titan X last year's model (12 GB VRAM) Runtime - Not measured, but less than an hour. BATCH_SIZE = 1 Trained on - 'VCTK-Corpus' NUM_STEPS = 2000 LEARNING_RATE = 0.03 "filter_width": 2, "quantization_steps": 256, "sample_rate": 16000, "dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512], "residual_channels": 32, "dilation_channels":16

adroit91 avatar Sep 16 '16 14:09 adroit91

@adroit91 were you able to generate an output with this? if yes could you also share the generated file

sjain07 avatar Sep 16 '16 14:09 sjain07

No, same issue as you. Still trying to trace it.

adroit91 avatar Sep 16 '16 14:09 adroit91

@adroit91 Thanks, man! Now let's attack the generating problem

pfeodrippe avatar Sep 16 '16 18:09 pfeodrippe

First sounds with the current default settings! :)

generated_18Sep_1714.wav.zip

adroit91 avatar Sep 18 '16 14:09 adroit91

It would be nice to have these models and examples in a centralized place, like the repo wiki or a Releases page.

lemonzi avatar Sep 19 '16 17:09 lemonzi

Yes, It'd be fine to have a centralized repository where to upload all examples.

Zeta36 avatar Sep 26 '16 06:09 Zeta36

How much was loss?

nakosung avatar Sep 26 '16 23:09 nakosung

I have tried to compile the generated audio in a wiki: Generated audio samples wiki

We can observe how the training is converting gibberish into natural sounding voice over iterations.

Learning rate = 0.002 NUM_STEPS = 240000 -- But, GPU (Titan X 12GB) went out of memory at 36423 steps. SAMPLE_SIZE = 100000 Loss was around 2-2.1

adroit91 avatar Sep 27 '16 06:09 adroit91

@adroit91 Could you share the different models or at least the last one? Is it the same zip you sent before?

Great work! I'll listen to it after my morning classes

pfeodrippe avatar Sep 27 '16 14:09 pfeodrippe

@lemonzi

It would be nice to have these models and examples in a centralized place, like the repo wiki or a Releases page.

Are you referring to just the model description, or do you mean the trained model's checkpoint file too? Github doesn't want to host large binary files (if I recall the TOS). I was thinking about getting a dropbox or some such account to put the ckpt file on, but something that anyone could contribute to would better. Is there some ready solution for this of which I'm ignorant?

jyegerlehner avatar Sep 27 '16 18:09 jyegerlehner

@jyegerlehner We may post large binaries as 'releases' which github supports. :)

nakosung avatar Sep 27 '16 19:09 nakosung

Thanks @nakosung that's news to me. I'm reading about github releases now :)

jyegerlehner avatar Sep 27 '16 19:09 jyegerlehner

@jyegerlehner I meant both! Releases would be great for an "official" checkpoint with the default model settings and a decent number of iterations, but are not world-editable.

lemonzi avatar Sep 27 '16 19:09 lemonzi

@adroit91 , I have limited computation resources, so I run this experiment with currently default config on only one person's waves (p225, about 231 waves). I have run experiment by 35K steps, but the loss still more than 2, and generated waves are noisy. Is there any issue about my experiment? Thanks very much.

weixsong avatar Jan 09 '17 01:01 weixsong

@weixsong Try decreasing silence threshold to 0.1-0.2 or lower. Yet be careful, with too low value it may start generating mostly silence. Also decreasing learning rate by 1-2 orders of magnitude might help.

After lowering silence threshold your loss would drop to circa 2. I'm not sure, but it looks like quality of generated sounds and loss are not directly related. Waiting for 10-20k more steps might help even if loss stays on relatively same level.

akademi4eg avatar Jan 09 '17 11:01 akademi4eg

@lemonzi, is this https://soundcloud.com/user-731806733/tensorflow-wavenet-500-msec-88k-train-steps wave generated by latest code that support global condition, people ID?,

weixsong avatar Jan 13 '17 09:01 weixsong

@akademi4eg , thanks very much for your help. Know i could generated some sound that sounds make sense.

weixsong avatar Jan 13 '17 09:01 weixsong

@weixsong Hi there! @jyegerlehner is the one to ask, he implemented Global Conditioning and submitted this sample ;)

lemonzi avatar Jan 13 '17 16:01 lemonzi

@weixsong No I think that one was trained on only audio from the single speaker. That model preceded conditioning on speaker id.

jyegerlehner avatar Jan 13 '17 17:01 jyegerlehner

@jyegerlehner , thanks very much. Do u have any wave that generated by model trained with latest code? I mean some wave generated with people ID as global condition.

weixsong avatar Jan 16 '17 08:01 weixsong

I am a complete newbie here. Can there be an implementation of the wavenet trained model in C#/.NET ?

I am using System.Speech.Recognition module of C#, which is misinterpreting the speech provided with something else (Eg. Hello = Well oh )

JumpsPumps avatar Jul 11 '18 11:07 JumpsPumps

@jyegerlehner did you find a way to upload the trained model's checkpoint?

ebadawy avatar Mar 01 '19 15:03 ebadawy

@ebadawy sorry that model is long gone. At the time I was ultimately trying to get local conditioning on text characters (i.e. TTS) to work with and didn't succeed. Didn't think the GC stuff was very interesting by itself.

I don't remember it being difficult to train; the PR has some discussion and links to samples I generated for different speakers. If I recall correctly I had a machine with a Titan GPU training for a few weeks to produce those samples so shouldn't be too hard to reproduce that if someone really wanted to.

Also, aren't there some pretty good open source implementations of TTS models (e.g. tacotron?) out there now, so I'd guess this wavenet implementation is not very interesting to most.

jyegerlehner avatar Mar 06 '19 21:03 jyegerlehner

First sounds with the current default settings! :)

generated_18Sep_1714.wav.zip

I wonder what is the point of this sample? it sounds just like a noise.

In original demo page they provide samples with voice https://deepmind.com/blog/wavenet-generative-model-raw-audio/

mrgloom avatar Mar 14 '19 19:03 mrgloom

In issue #307 @DiyuanLu has provided a pre-trained model for 12000 steps.

greysou1 avatar Aug 29 '19 09:08 greysou1