tacotron icon indicating copy to clipboard operation
tacotron copied to clipboard

Nancy Corpus pre-trained

Open LearnedVector opened this issue 7 years ago • 28 comments

Is the nancy corpus pre-trained model available anywhere for use? I think the one provided in the README is LJ Speech trained model

LearnedVector avatar Dec 18 '17 16:12 LearnedVector

@MXGray: would you be willing to share your pre-trained model on the Nancy corpus?

keithito avatar Dec 19 '17 22:12 keithito

@keithito @Mn0491

No problem - Here you go: https://github.com/keithito/tacotron/issues/15#issuecomment-342632496

MXGray avatar Dec 20 '17 00:12 MXGray

@MXGray Thank you so much!

The one that you linked to doesn't seem to perform as well as the on https://keithito.github.io/audio-samples/. Are they the same model? Here is a sample of what the model outputted

  • "President Trump met with other leaders at the Group of 20 conferences."
  • https://instaud.io/1yDS (audio sample)

This is me using the default demo-server.py and pointing the checkpoint to the nancy corpus model you provided.

Thanks again for posting the link, and thank you @keithito for this awesome project.

LearnedVector avatar Dec 20 '17 01:12 LearnedVector

@Mn0491

Oh, my bad - That's the Tagalog model that I trained on top of the Nancy model. I'll upload the correct Nancy model later tonight when I get in front of my laptop and will post it here.

MXGray avatar Dec 20 '17 03:12 MXGray

@MXGray could you, please, upload the English model. Model in your drive reference ref doesn't generate english speech.

geneing avatar Dec 31 '17 23:12 geneing

@Mn0491 @geneing Sorry guys, crazy holidays. :) Here you go - Happy 2018! https://drive.google.com/file/d/1c_O-Gha03_erKbilsFCvs9QJ8faJ7ou8/view?usp=sharing

MXGray avatar Jan 01 '18 02:01 MXGray

@MXGray thank you! Happy 2018!

LearnedVector avatar Jan 01 '18 02:01 LearnedVector

@MXGray Thank you. It works now.

geneing avatar Jan 02 '18 06:01 geneing

@t3t3t3 Are you training a model on top of this? If so, then after preprocessing your training data, you'll get max output length. Divide this with output per step, and use that for max iters in hparams.py ... Hope this helps. :)

MXGray avatar Jan 02 '18 08:01 MXGray

Thanks! I think I messed up the parameters somewhere. I reinstall the clean source code and it works now!

t3t3t3 avatar Jan 02 '18 08:01 t3t3t3

Excuse me. Can you tell me how do I download the Nancy Corpus dataset from blizzard 2011 on CSTR. I cannot even find the entry for register. @MXGray @keithito

begeekmyfriend avatar Jan 16 '18 16:01 begeekmyfriend

@begeekmyfriend Hello! You need to click "license", it'll redirect to this page, fill in the form and wait few days for it to be approved. You'll get 2 emails total

http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/license.html

gloriouskilka avatar Jan 16 '18 16:01 gloriouskilka

@gloriouskilka Thanks a lot. Nancy Corpus dataset seems better for training as English language.

begeekmyfriend avatar Jan 16 '18 16:01 begeekmyfriend

@MXGray Thanks! Do you mind if I upload your trained model to a public GitHub repository? I'd like to make a Docker container for running Tacotron. (curl doesn't play well with Google Drive links)

js0nwu avatar Jan 31 '18 01:01 js0nwu

@ArkaneCow No, I don't mind. It'll be very helpful. Please share the link here once it's up. Thanks. :)

MXGray avatar Feb 02 '18 01:02 MXGray

@MXGray Thanks! I created a Docker for running this repository here: https://hub.docker.com/r/arkanecow/dockerfile-keithito-tacotron/ The Docker file is here: https://github.com/ArkaneCow/dockerfile-keithito-tacotron The repository where the models are hosted is here: https://github.com/ArkaneCow/tacotron-models

js0nwu avatar Feb 04 '18 06:02 js0nwu

@MXGray Could you please clarify, did you use the default parameters for training on Nancy corpus? Thanks in advance!

quadraaa avatar Feb 21 '18 11:02 quadraaa

@MXGray Thanks for your contribution. I listened the demo audio with LJSpeech and Nancy both, found that Nancy is better.

I download files from the official Nancy website you provided, but I do not know how to handle those files. Therefore, I downloaded the wav and text from here:

https://github.com/barronalex/Tacotron/blob/master/download_data.sh

I changed the contents format in prompts.data into metadata.csv, such as: APDC2-017-01|Children act prematurely.|Children act prematurely.

However, I got an error during running train.py:

Starting new training run at commit: None Generated 32 batches of size 32 in 2.455 sec Traceback (most recent call last): File "/home/chris/new_2018/u41_2_nancy/tacotron/tacotron/datasets/datafeeder.py", line 74, in run self._enqueue_next_group() File "/home/chris/new_2018/u41_2_nancy/tacotron/tacotron/datasets/datafeeder.py", line 96, in _enqueue_next_group self._session.run(self._enqueue_op, feed_dict=feed_dict) File "/home/chris/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/chris/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1096, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (32, 665, 64) for Tensor 'datafeeder/mel_targets:0', which has shape '(?, ?, 80)'

Is there anything I should edit in the original code written by @keithito, as the different between ljspeech and nancy corpus?

ghost avatar Mar 28 '18 09:03 ghost

@Quadraaa @DavidAksnes

All default parameters in hparams.py, except max_iters. This value should be set to max output length divided by output per step. For example - After preprocessing the Nancy dataset, let's say you get 1605 as the value of max output length; and Default output per step is 5, so:

1605 / 5 = 321 (this should be the value of max_iters)

Hope this helps!

MXGray avatar Mar 28 '18 10:03 MXGray

Hi @MXGray,

I've been training off the Nancy Corpus as you did, using the default parameters. I've synthesises to 233000 steps, but whenever I synthesise a sentence it has lots of echoing afterwards, whereas your model doesn't generate echoes. I was wondering if you had any suggestions how to fix this? Find an example below using the prompt "This is the example recording." c9aeb4d2-b929-4c46-bdef-d644a36280f3.wav.zip

Thanks

marcom48 avatar Apr 10 '18 07:04 marcom48

Hi @MXGray, Thank you for sharing such a perfect trained model. I used your model as a pretrained model for synthesizing in my own language. My data is about 2.2 h with single speaker and most of phonemes are mapped like CMUDict phonemes. After 950K iters, the model can only align first half of middle and long sentences. The other half is so bad and it seems that the model can not learn to align second part of the sentences. Why this happens? How can I fix this?

Thanks

navidnadery avatar May 19 '18 13:05 navidnadery

@keithito Do you have any idea about mentioned problem of mine?

navidnadery avatar May 23 '18 13:05 navidnadery

@navidnadery I'm not sure why this would happen. Maybe there's a lack of long sentences in your training data? You can also try with Location Sensitive Attention (or hybrid attention) to see if that yields a better result.

keithito avatar May 24 '18 23:05 keithito

Hi guys!! Does anyone know if the sample_rate in hparams.py need to be changed to the sampling frequency of the sound files in the dataset? Like 16K for the Nancy dataset?

Shikherneo2 avatar Jul 05 '18 15:07 Shikherneo2

@Shikherneo2 you mean sample_rate?

yoosif0 avatar Sep 07 '18 09:09 yoosif0

It seems that with the new updates on the code it cannot be used with Nancy pre-trained model. Does anyone have a similar issue?

marymirzaei avatar Oct 30 '18 09:10 marymirzaei

I think you could not use a checkpoint when hparams are changed.

yoosif0 avatar Oct 30 '18 12:10 yoosif0

Can anyone share some part of code or some directives to how I can use this pre-trained model ?(using the same code as the one shared in this repo).

I tried running this : !python3 eval.py --checkpoint nancy_model/model.ckpt-250000

but I get the following error : [...] NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key model/inference/decoder/output_projection_wrapper/multi_rnn_cell/cell_0/output_projection_wrapper/concat_output_and_attention_wrapper/decoder_prenet_wrapper/attention_wrapper/bahdanau_attention/attention_v not found in checkpoint [[node save/RestoreV2 (defined at /home/ec2-user/SageMaker/tacotron/synthesizer.py:24) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

AyaLahlou avatar Jun 22 '20 14:06 AyaLahlou