Grzegorz George Pawelczak
Results
3
comments of
Grzegorz George Pawelczak
One could write an optimizer (for example Adam) for a model which has the weights and gradients in fp16, but the slot variables might have to be in higher precision...
Any thoughts on the proposed solution?
Hey, I just wanted to throw in some personal experience with working on gradient accumulation in TF/Keras at Graphcore for IPUs. 1. Batch Norm - for the MLPerf submission distributed...