overvalidated

Results 1 issues of overvalidated

When FP8 is utilized model that is loaded in fp16 (llama) OOMs during training. Model works perfectly in fp16 mode. My guess is that autocast of model to TE layers...