block swap with no_grad context

Open kohya-ss opened this issue 9 months ago • 1 comments

Add with torch.no_grad() for block swap. Unfortunately block swapping doesn't work well with multi-GPU training, but it may solve the loss=NaN issue.

Mar 17 '25 12:03 kohya-ss

What are the issues with multi-GPU block swapping? Would we want to distribute blocks to different devices? Or is it library specific?

Mar 18 '25 19:03 rockerBOO