Neural-HMM
Neural-HMM copied to clipboard
Variance floored
When I train (in my language - czech), variance floored is sometimes displayed. But train usually continues. Is it a mistake? And how do I fix this error? (my batch size is only 1 - gtx1080 8GB, so it can't be reduced anymore). Could you not describe in HPARAMS what each line means (at least the most important code lines) ?
How does the synthetic speech sound after training for a few thousand updates?
Variance flooring is not an error and training is expected to continue. The corresponding hyperparameter variance_floor
is a lower bound/threshold on the standard deviations σ predicted by the model. The message means that the predicted standard deviation was smaller than the bound (i.e., σ < variance_floor
), and thus σ was set to equal variance_floor
instead. I believe this should be unrelated to batch size.
Very low σ-values can indicate a degenerate model that's overfitting to a single observation, and variance flooring is a protection against this. Such flooring is of importance in, for instance, classic decision-tree-based text-to-speech.
If you are using the default value variance_floor=0.001
and your data is normalised to global mean 0 and standard deviation 1, the warnings suggest to me that there may be pathologies in data or training. My inclination would be to check for issues with the data/processing and to try increasing variance_floor
to at least 0.1
. (I personally believe that the repository default value probably is too small, but we have not tried tuning it.) This should lead to a lot more warning messages about flooring, which you could comment out to get cleaner log files, but it will provide better protection against degenerate optima.
One way to think about it is that, for high values of variance_floor
(when the variance is floored all the time), training becomes equivalent to training with the mean squared error (MSE) loss function. For lower values of variance_floor
, our training can give a slightly different model than training with the MSE loss would. Maximum-likelihood training, as in this repo, is in some sense theoretically more general/"smarter" than MSE, but in practice it can also go wrong sometimes, and variance flooring provides an (adjustable) level of protection against such situations.