TomYang-TZ
Results
1
issues of
TomYang-TZ
When utilizing Axolotl, the training loss reduces to 0 following the gradient accumulation steps. Is this expected behaviour? With Torchrun, the training loss consistently remains NaN. Thanks for the help!!...