sd-scripts
sd-scripts copied to clipboard
block swap with no_grad context
Add with torch.no_grad() for block swap. Unfortunately block swapping doesn't work well with multi-GPU training, but it may solve the loss=NaN issue.
What are the issues with multi-GPU block swapping? Would we want to distribute blocks to different devices? Or is it library specific?