expressive_tacotron
expressive_tacotron copied to clipboard
How to find out when training went wrong?
Thank you very much for your contribution. I have trained the model on LJ Speech
for 835k. However, the results are not as good as the samples you provided for 420k. Maybe some problem with my training? Below you can find the attention plot and the sample audio at 835k. What kind of attention plot signals a good checkpoint for the synthesizer?
And the progress was like this:
The samples synthesized from this checkpoint can be found here: https://www.dropbox.com/sh/n5ld72rn9otxl7a/AAACyplZMtxiYtuUgvWN8OGaa?dl=0
Also, the trained model (checkpoint), is uploaded here: https://www.dropbox.com/sh/ks91bdputl5ujo7/AABRIqpviRDBgWuFIJn1yuhba?dl=0
Also, I was wondering if you have any plans to release your trained model.
Another thing is the tf.save
keeps the last 5 checkpoints by default, and the wrapper used here (i.e. tf.train.Supervisor
) does not easily allow changing max_to_keep
property of the saver
.
PS. The hyperparameters are kept as default.
# signal processing
sr = 22050 # Sample rate.
n_fft = 2048 # fft points (samples)
frame_shift = 0.0125 # seconds
frame_length = 0.05 # seconds
hop_length = int(sr*frame_shift) # samples.
win_length = int(sr*frame_length) # samples.
n_mels = 80 # Number of Mel banks to generate
power = 1.2 # Exponent for amplifying the predicted magnitude
n_iter = 50 # Number of inversion iterations
preemphasis = .97 # or None
max_db = 100
ref_db = 20
# model
embed_size = 256 # alias = E
encoder_num_banks = 16
decoder_num_banks = 8
num_highwaynet_blocks = 4
r = 5 # Reduction factor.
dropout_rate = .5
# training scheme
lr = 0.001 # Initial learning rate.
logdir = "logdir"
sampledir = 'samples'
batch_size = 32
num_iterations = 1000000