Grzegorz George Pawelczak

Results 3 comments of Grzegorz George Pawelczak

One could write an optimizer (for example Adam) for a model which has the weights and gradients in fp16, but the slot variables might have to be in higher precision...

Hey, I just wanted to throw in some personal experience with working on gradient accumulation in TF/Keras at Graphcore for IPUs. 1. Batch Norm - for the MLPerf submission distributed...