overvalidated
Results
1
issues of
overvalidated
When FP8 is utilized model that is loaded in fp16 (llama) OOMs during training. Model works perfectly in fp16 mode. My guess is that autocast of model to TE layers...