Teng-xu

Results 2 comments of Teng-xu

Yeah bf16 was passed into the training args, and I can verify it is being applied correctly.

I am observing the same behavior during training without FP8, and I believe that these states are causing problems when attempting to load checkpoints into the model, especially when there...