tacotron
tacotron copied to clipboard
eval.py: Evaluation broken
Hey Guys, the last commits from today seems to break evaluation code. More precisely spectrogram2wav.
If you're not using log magnitudes, eval.py crashes during sampling because the buffer to generate the audio from is not finite. I checked eval.py against the previous revision. Assuming that you're not using log magnitudes (ie hp.use_log_magnitude = False), this here is the change:
audio = spectrogram2wav(np.power(np.e, s)**hp.power)
changes to
audio = spectrogram2wav(s**hp.power)
I'm not quite sure since I'm a bit confused by all the log vs non-log discussions going on, but I guess this is wrong. The paper says they're predicting mel-scale. Correct me, but isn't mel-scale already log? So my assumption is that the first line is still valid for non-log magnitudes, right?
Technically speaking mel-scale is not exactly the same as log. See https://en.wikipedia.org/wiki/Mel_scale. The paper says they use melspectrogram and linear-scale log magnitude (spectrogram). So the spectrogram2wav converts the predicted magnitude to the waveform. It has nothing to do with melspectrogram.
The reason why people care about whether we appy logarithm to magnitude in training is two, in my opinion. First, we or at least I don't have a full understanding of why it is useful. Second, in practice it needs our attention in that there are three times of padding--reducing frames, dynamic padding, and convolution with same padding.
For now I don't know why there's a problem when you set use_log_magnitude to False. I'll check soon.
Oh now I know the reason. For the plain magnitude we must not allow negative numbers because of the power. I simply clipped the value to zero. https://github.com/Kyubyong/tacotron/blob/master/eval.py#L70
Well, you're right when you say that mel isn't exactly log. And to be honest, my explanation isn't much more then a first guess. I didn't check every part of the code and I agree with you: I'm not quite sure what the log stuff is all about.
But: Even after your latest commits, I can't generate non-silent audio from my model if use_log_magnitude is False