TTS icon indicating copy to clipboard operation
TTS copied to clipboard

[Feature request] Explain Tensorflow stats in the documentation

Open tcz opened this issue 1 year ago β€’ 2 comments

πŸš€ Feature Description

I'm very confused about some of the stats in Tensorflow and Google is not helping. I'm happy to make a documentation PR but I'd like to understand first what these are:

  • avg_log_mle (how it differs from avg_loss)
  • avg_loss_dur (how it differs from avg_loss)
  • avg_amp_scaler
  • how the EvalFigures images are chosen (I imagine it's random)
  • how EvalAudios is chosen (also random?)
  • how to interpret EvalFigures/alignment?

Solution

Update docs with a description of each.

Alternative Solutions

Respond here and I'll do the PR.

Additional context

N/A

tcz avatar Aug 03 '22 06:08 tcz

avg_log_mle (how it differs from avg_loss) - I am not sure about this. What model are you training? avg_loss_dur (how it differs from avg_loss) - duration predictor loss. avg_amp_scaler - amp loss scaler to stabilize model training in mixed-precision mode. Check Pytorch docs for more details. how the EvalFigures images are chosen (I imagine it's random) - The first sample of the last batch in the evaluation epoch. how EvalAudios is chosen (also random?) - Same as above. how to interpret EvalFigures/alignment? - It is hard to explain. You need some experience and domain knowledge. However, you compare the output spectrogram with the real spectrogram. Alignment should be continuously monotonic with no cut-offs and near diagonal.

erogol avatar Aug 07 '22 12:08 erogol

Thank you for the explanations.

I am not sure about this. What model are you training?

GlowTTS.

tcz avatar Aug 08 '22 06:08 tcz

I close this for now.

erogol avatar Aug 15 '22 09:08 erogol