TTS [Feature request] Explain Tensorflow stats in the documentation

[Feature request] Explain Tensorflow stats in the documentation

Open tcz opened this issue 1 year ago • 2 comments

🚀 Feature Description

I'm very confused about some of the stats in Tensorflow and Google is not helping. I'm happy to make a documentation PR but I'd like to understand first what these are:

avg_log_mle (how it differs from avg_loss)
avg_loss_dur (how it differs from avg_loss)
avg_amp_scaler
how the EvalFigures images are chosen (I imagine it's random)
how EvalAudios is chosen (also random?)
how to interpret EvalFigures/alignment?

Solution

Update docs with a description of each.

Alternative Solutions

Respond here and I'll do the PR.

Additional context

N/A

Aug 03 '22 06:08 tcz

avg_log_mle (how it differs from avg_loss) - I am not sure about this. What model are you training? avg_loss_dur (how it differs from avg_loss) - duration predictor loss. avg_amp_scaler - amp loss scaler to stabilize model training in mixed-precision mode. Check Pytorch docs for more details. how the EvalFigures images are chosen (I imagine it's random) - The first sample of the last batch in the evaluation epoch. how EvalAudios is chosen (also random?) - Same as above. how to interpret EvalFigures/alignment? - It is hard to explain. You need some experience and domain knowledge. However, you compare the output spectrogram with the real spectrogram. Alignment should be continuously monotonic with no cut-offs and near diagonal.

Aug 07 '22 12:08 erogol

Thank you for the explanations.

I am not sure about this. What model are you training?

GlowTTS.

Aug 08 '22 06:08 tcz

I close this for now.

Aug 15 '22 09:08 erogol

TTS TTS copied to clipboard

[Feature request] Explain Tensorflow stats in the documentation

TTS
TTS copied to clipboard