TomYang-TZ

Results 1 issues of TomYang-TZ

When utilizing Axolotl, the training loss reduces to 0 following the gradient accumulation steps. Is this expected behaviour? With Torchrun, the training loss consistently remains NaN. Thanks for the help!!...