sd-scripts
sd-scripts copied to clipboard
how is multi gpu loss gathered?
I've been looking into the sd3 train branch, im trying to understand how are the loss gathered for multi-gpu and would love to understand the logic behind it. I'm used to working with accelerator.gather/reduce for loss/tensor updates. however im not seeing any of that being used in the sd3 training script which got me curious - how are the losses gathered across all processes