mk
Results
1
comments of
mk
I had the same problem. I have found that it happens because `adafactor` returns `float32` updates despite params and gradients being `bfloat16`, while MultiSteps expects them to be of the...