TTS
TTS copied to clipboard
[Feature request] Explain Tensorflow stats in the documentation
π Feature Description
I'm very confused about some of the stats in Tensorflow and Google is not helping. I'm happy to make a documentation PR but I'd like to understand first what these are:
- avg_log_mle (how it differs from avg_loss)
- avg_loss_dur (how it differs from avg_loss)
- avg_amp_scaler
- how the EvalFigures images are chosen (I imagine it's random)
- how EvalAudios is chosen (also random?)
- how to interpret EvalFigures/alignment?
Solution
Update docs with a description of each.
Alternative Solutions
Respond here and I'll do the PR.
Additional context
N/A
avg_log_mle (how it differs from avg_loss) - I am not sure about this. What model are you training? avg_loss_dur (how it differs from avg_loss) - duration predictor loss. avg_amp_scaler - amp loss scaler to stabilize model training in mixed-precision mode. Check Pytorch docs for more details. how the EvalFigures images are chosen (I imagine it's random) - The first sample of the last batch in the evaluation epoch. how EvalAudios is chosen (also random?) - Same as above. how to interpret EvalFigures/alignment? - It is hard to explain. You need some experience and domain knowledge. However, you compare the output spectrogram with the real spectrogram. Alignment should be continuously monotonic with no cut-offs and near diagonal.
Thank you for the explanations.
I am not sure about this. What model are you training?
GlowTTS.
I close this for now.