neoml icon indicating copy to clipboard operation
neoml copied to clipboard

Check weights for NaN

Open PeterMinin opened this issue 3 years ago • 0 comments

If some weights in the network become NaN at some point during training, I'd like the training to stop with an error. Currently, when training on a GPU, there is no error, training just continues, usually giving poor results. An error occurs later, when the model is executed on the CPU, where a "Floating-point invalid operation" is thrown at some point. Perhaps such a check could be an optional (on by default) step after each parameter update.

PeterMinin avatar Feb 22 '22 19:02 PeterMinin