torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

v0.3 regression, full_finetune_distributed slower ?

Open Delaunay opened this issue 4 months ago • 6 comments

The recipe full_finetune_distributed Appear to be much slower in v0.3 than v0.2.1

Everything seems to work as usual, but my job that used to work in v0.2.1 time out in v0.3.0.

I don't have much detail yet, but maybe as you are more familiar with the code base you could have an idea already based on what changed recently!

Delaunay avatar Sep 30 '24 14:09 Delaunay