TomYang-TZ issues

Repositories
Issues
Comments

Results 1 issues of


                                            TomYang-TZ

Medusa Training Loss

When utilizing Axolotl, the training loss reduces to 0 following the gradient accumulation steps. Is this expected behaviour? With Torchrun, the training loss consistently remains NaN. Thanks for the help!!...