tacotron Nancy Corpus pre-trained

Is the nancy corpus pre-trained model available anywhere for use? I think the one provided in the README is LJ Speech trained model

Dec 18 '17 16:12 LearnedVector

@MXGray: would you be willing to share your pre-trained model on the Nancy corpus?

Dec 19 '17 22:12 keithito

@keithito @Mn0491

No problem - Here you go: https://github.com/keithito/tacotron/issues/15#issuecomment-342632496

Dec 20 '17 00:12 MXGray

@MXGray Thank you so much!

The one that you linked to doesn't seem to perform as well as the on https://keithito.github.io/audio-samples/. Are they the same model? Here is a sample of what the model outputted

"President Trump met with other leaders at the Group of 20 conferences."
https://instaud.io/1yDS (audio sample)

This is me using the default demo-server.py and pointing the checkpoint to the nancy corpus model you provided.

Thanks again for posting the link, and thank you @keithito for this awesome project.

Dec 20 '17 01:12 LearnedVector

@Mn0491

Oh, my bad - That's the Tagalog model that I trained on top of the Nancy model. I'll upload the correct Nancy model later tonight when I get in front of my laptop and will post it here.

Dec 20 '17 03:12 MXGray

@MXGray could you, please, upload the English model. Model in your drive reference ref doesn't generate english speech.

Dec 31 '17 23:12 geneing

@Mn0491 @geneing Sorry guys, crazy holidays. :) Here you go - Happy 2018! https://drive.google.com/file/d/1c_O-Gha03_erKbilsFCvs9QJ8faJ7ou8/view?usp=sharing

Jan 01 '18 02:01 MXGray

@MXGray thank you! Happy 2018!

Jan 01 '18 02:01 LearnedVector

@MXGray Thank you. It works now.

Jan 02 '18 06:01 geneing

@t3t3t3 Are you training a model on top of this? If so, then after preprocessing your training data, you'll get max output length. Divide this with output per step, and use that for max iters in hparams.py ... Hope this helps. :)

Jan 02 '18 08:01 MXGray

Thanks! I think I messed up the parameters somewhere. I reinstall the clean source code and it works now!

Jan 02 '18 08:01 t3t3t3

Excuse me. Can you tell me how do I download the Nancy Corpus dataset from blizzard 2011 on CSTR. I cannot even find the entry for register. @MXGray @keithito

Jan 16 '18 16:01 begeekmyfriend

@begeekmyfriend Hello! You need to click "license", it'll redirect to this page, fill in the form and wait few days for it to be approved. You'll get 2 emails total

http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/license.html

Jan 16 '18 16:01 gloriouskilka

@gloriouskilka Thanks a lot. Nancy Corpus dataset seems better for training as English language.

Jan 16 '18 16:01 begeekmyfriend

@MXGray Thanks! Do you mind if I upload your trained model to a public GitHub repository? I'd like to make a Docker container for running Tacotron. (curl doesn't play well with Google Drive links)

Jan 31 '18 01:01 js0nwu

@ArkaneCow No, I don't mind. It'll be very helpful. Please share the link here once it's up. Thanks. :)

Feb 02 '18 01:02 MXGray

@MXGray Thanks! I created a Docker for running this repository here: https://hub.docker.com/r/arkanecow/dockerfile-keithito-tacotron/ The Docker file is here: https://github.com/ArkaneCow/dockerfile-keithito-tacotron The repository where the models are hosted is here: https://github.com/ArkaneCow/tacotron-models

Feb 04 '18 06:02 js0nwu

@MXGray Could you please clarify, did you use the default parameters for training on Nancy corpus? Thanks in advance!

Feb 21 '18 11:02 quadraaa

@MXGray Thanks for your contribution. I listened the demo audio with LJSpeech and Nancy both, found that Nancy is better.

I download files from the official Nancy website you provided, but I do not know how to handle those files. Therefore, I downloaded the wav and text from here:

https://github.com/barronalex/Tacotron/blob/master/download_data.sh

I changed the contents format in prompts.data into metadata.csv, such as: APDC2-017-01|Children act prematurely.|Children act prematurely.

However, I got an error during running train.py:

Starting new training run at commit: None Generated 32 batches of size 32 in 2.455 sec Traceback (most recent call last): File "/home/chris/new_2018/u41_2_nancy/tacotron/tacotron/datasets/datafeeder.py", line 74, in run self._enqueue_next_group() File "/home/chris/new_2018/u41_2_nancy/tacotron/tacotron/datasets/datafeeder.py", line 96, in _enqueue_next_group self._session.run(self._enqueue_op, feed_dict=feed_dict) File "/home/chris/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/chris/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1096, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (32, 665, 64) for Tensor 'datafeeder/mel_targets:0', which has shape '(?, ?, 80)'

Is there anything I should edit in the original code written by @keithito, as the different between ljspeech and nancy corpus?

Mar 28 '18 09:03 ghost

@Quadraaa @DavidAksnes

All default parameters in hparams.py, except max_iters. This value should be set to max output length divided by output per step. For example - After preprocessing the Nancy dataset, let's say you get 1605 as the value of max output length; and Default output per step is 5, so:

1605 / 5 = 321 (this should be the value of max_iters)

Hope this helps!

Mar 28 '18 10:03 MXGray

Hi @MXGray,

I've been training off the Nancy Corpus as you did, using the default parameters. I've synthesises to 233000 steps, but whenever I synthesise a sentence it has lots of echoing afterwards, whereas your model doesn't generate echoes. I was wondering if you had any suggestions how to fix this? Find an example below using the prompt "This is the example recording." c9aeb4d2-b929-4c46-bdef-d644a36280f3.wav.zip

Thanks

Apr 10 '18 07:04 marcom48

Hi @MXGray, Thank you for sharing such a perfect trained model. I used your model as a pretrained model for synthesizing in my own language. My data is about 2.2 h with single speaker and most of phonemes are mapped like CMUDict phonemes. After 950K iters, the model can only align first half of middle and long sentences. The other half is so bad and it seems that the model can not learn to align second part of the sentences. Why this happens? How can I fix this?

Thanks

May 19 '18 13:05 navidnadery

@keithito Do you have any idea about mentioned problem of mine?

May 23 '18 13:05 navidnadery

@navidnadery I'm not sure why this would happen. Maybe there's a lack of long sentences in your training data? You can also try with Location Sensitive Attention (or hybrid attention) to see if that yields a better result.

May 24 '18 23:05 keithito

Hi guys!! Does anyone know if the sample_rate in hparams.py need to be changed to the sampling frequency of the sound files in the dataset? Like 16K for the Nancy dataset?

Jul 05 '18 15:07 Shikherneo2

@Shikherneo2 you mean sample_rate?

Sep 07 '18 09:09 yoosif0

It seems that with the new updates on the code it cannot be used with Nancy pre-trained model. Does anyone have a similar issue?

Oct 30 '18 09:10 marymirzaei

I think you could not use a checkpoint when hparams are changed.

Oct 30 '18 12:10 yoosif0

Can anyone share some part of code or some directives to how I can use this pre-trained model ?(using the same code as the one shared in this repo).

I tried running this : !python3 eval.py --checkpoint nancy_model/model.ckpt-250000

but I get the following error : [...] NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key model/inference/decoder/output_projection_wrapper/multi_rnn_cell/cell_0/output_projection_wrapper/concat_output_and_attention_wrapper/decoder_prenet_wrapper/attention_wrapper/bahdanau_attention/attention_v not found in checkpoint [[node save/RestoreV2 (defined at /home/ec2-user/SageMaker/tacotron/synthesizer.py:24) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Jun 22 '20 14:06 AyaLahlou

tacotron tacotron copied to clipboard

Nancy Corpus pre-trained

tacotron
tacotron copied to clipboard