Teng-xu
Results
2
comments of
Teng-xu
Yeah bf16 was passed into the training args, and I can verify it is being applied correctly.
I am observing the same behavior during training without FP8, and I believe that these states are causing problems when attempting to load checkpoints into the model, especially when there...