Neural-HMM icon indicating copy to clipboard operation
Neural-HMM copied to clipboard

Variance floored

Open Ctibor67 opened this issue 3 years ago • 2 comments

When I train (in my language - czech), variance floored is sometimes displayed. But train usually continues. Is it a mistake? And how do I fix this error? (my batch size is only 1 - gtx1080 8GB, so it can't be reduced anymore). Could you not describe in HPARAMS what each line means (at least the most important code lines) ?

Ctibor67 avatar Dec 09 '21 16:12 Ctibor67

How does the synthetic speech sound after training for a few thousand updates?

Variance flooring is not an error and training is expected to continue. The corresponding hyperparameter variance_floor is a lower bound/threshold on the standard deviations σ predicted by the model. The message means that the predicted standard deviation was smaller than the bound (i.e., σ < variance_floor), and thus σ was set to equal variance_floor instead. I believe this should be unrelated to batch size.

Very low σ-values can indicate a degenerate model that's overfitting to a single observation, and variance flooring is a protection against this. Such flooring is of importance in, for instance, classic decision-tree-based text-to-speech.

If you are using the default value variance_floor=0.001 and your data is normalised to global mean 0 and standard deviation 1, the warnings suggest to me that there may be pathologies in data or training. My inclination would be to check for issues with the data/processing and to try increasing variance_floor to at least 0.1. (I personally believe that the repository default value probably is too small, but we have not tried tuning it.) This should lead to a lot more warning messages about flooring, which you could comment out to get cleaner log files, but it will provide better protection against degenerate optima.

ghenter avatar Dec 09 '21 22:12 ghenter

One way to think about it is that, for high values of variance_floor (when the variance is floored all the time), training becomes equivalent to training with the mean squared error (MSE) loss function. For lower values of variance_floor, our training can give a slightly different model than training with the MSE loss would. Maximum-likelihood training, as in this repo, is in some sense theoretically more general/"smarter" than MSE, but in practice it can also go wrong sometimes, and variance flooring provides an (adjustable) level of protection against such situations.

ghenter avatar Dec 10 '21 15:12 ghenter