Marigold
Marigold copied to clipboard
Out of memory when training with RTX4090, seeking guidance on training details
I am currently attempting to reproduce the training process described in your paper using Stable Diffusion v2. However, my RTX 4090 ran out of memory when training with batchsize 32 , as mentioned in the paper. I use a resolution of 768x768(same as Stable Diffusion v2) and I am uncertain whether this setting is appropriate.
they use a different way of training i think, they concatenate the color image to depth image just like inpainting training , but i think different ... have you covered that ?
try to lower the batchsize it does not have a big change i think ... my rtx 4090 is limited to 10 batchsize on 768*768
I know little about how inpainting training was like, could you please share the related papers or web pages?
Thanks for your interest in our work. As we addressed in the Section 4.1 of our paper, we use gradient accumulation to account for the memory pressure during training:
Training our method takes 18K iterations using a batch size of 32. To fit one GPU, we accumulate gradients for 16 steps.
I am trying to follow this work recently. Can you share your training code?
Thanks for your interest in our work. As we addressed in the Section 4.1 of our paper, we use gradient accumulation to account for the memory pressure during training:
Training our method takes 18K iterations using a batch size of 32. To fit one GPU, we accumulate gradients for 16 steps.
@markkua Do you mean you are using 2 batch size data in one loop, forward->calculate loss->backward; then after 16 loops the weights are updated, so the actual batch size is 2*16=32. right? Or do you use 32 batch size data in one loop, the actual batch size is 32*16=512? Hope you can clarify this, thanks.
Thanks for your interest in our work. As we addressed in the Section 4.1 of our paper, we use gradient accumulation to account for the memory pressure during training:
Training our method takes 18K iterations using a batch size of 32. To fit one GPU, we accumulate gradients for 16 steps.
@markkua Do you mean you are using 2 batch size data in one loop, forward->calculate loss->backward; then after 16 loops the weights are updated, so the actual batch size is 216=32. right? Or do you use 32 batch size data in one loop, the actual batch size is 3216=512? Hope you can clarify this, thanks.
The effective batchsize is 32, i.e. 2*16 = 32.
Hi, I still encounter OOM error on 4090 24G with image resolution 480x640 and batch size 2. So I am wondering if the authors used fp16 version of Stable Diffusion. Can you clarify? @markkua Thank You.
Hi, we didn't use fp16 during training.
Ok. Thanks. Could you let us know that image resolution did you use for training ? And which version of Stable Diffusion.
Best, Songlin Wei
发件人: Bingxin Ke @.> 发送时间: Monday, March 25, 2024 8:26:33 PM 收件人: prs-eth/Marigold @.> 抄送: Songlin Wei @.>; Comment @.> 主题: Re: [prs-eth/Marigold] Out of memory when training with RTX4090, seeking guidance on training details (Issue #26)
Hi, we didn't use fp16 during training.
― Reply to this email directly, view it on GitHubhttps://github.com/prs-eth/Marigold/issues/26#issuecomment-2017889836, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGQINNRKN5GKVCJEHA74GTY2AJXTAVCNFSM6AAAAABA6SA4WGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJXHA4DSOBTGY. You are receiving this because you commented.Message ID: @.***>
We used SD v2 . For resolution, please refer to Sec 4.2 for details.
Thanks for the reply. According to the paper, the image resolution is 640x480, But I really cannot fit batch size=2 into a 4090 24G graphic card. Anything could go wrong? Any help is appreciated.
Please check if this helps.
Thank you for the information. But I still cant make it work even with xformers. I have to adjust the resolution to 240x320.