Marigold icon indicating copy to clipboard operation
Marigold copied to clipboard

Out of memory when training with RTX4090, seeking guidance on training details

Open zuodexin opened this issue 1 year ago • 2 comments

I am currently attempting to reproduce the training process described in your paper using Stable Diffusion v2. However, my RTX 4090 ran out of memory when training with batchsize 32 , as mentioned in the paper. I use a resolution of 768x768(same as Stable Diffusion v2) and I am uncertain whether this setting is appropriate.

zuodexin avatar Dec 21 '23 15:12 zuodexin

they use a different way of training i think, they concatenate the color image to depth image just like inpainting training , but i think different ... have you covered that ?
try to lower the batchsize it does not have a big change i think ... my rtx 4090 is limited to 10 batchsize on 768*768

mr-lab avatar Dec 21 '23 16:12 mr-lab

I know little about how inpainting training was like, could you please share the related papers or web pages?

zuodexin avatar Dec 23 '23 08:12 zuodexin

Thanks for your interest in our work. As we addressed in the Section 4.1 of our paper, we use gradient accumulation to account for the memory pressure during training:

Training our method takes 18K iterations using a batch size of 32. To fit one GPU, we accumulate gradients for 16 steps.

markkua avatar Jan 03 '24 11:01 markkua

I am trying to follow this work recently. Can you share your training code?

willpat1213 avatar Jan 14 '24 12:01 willpat1213

Thanks for your interest in our work. As we addressed in the Section 4.1 of our paper, we use gradient accumulation to account for the memory pressure during training:

Training our method takes 18K iterations using a batch size of 32. To fit one GPU, we accumulate gradients for 16 steps.

@markkua Do you mean you are using 2 batch size data in one loop, forward->calculate loss->backward; then after 16 loops the weights are updated, so the actual batch size is 2*16=32. right? Or do you use 32 batch size data in one loop, the actual batch size is 32*16=512? Hope you can clarify this, thanks.

ZachL1 avatar Jan 29 '24 08:01 ZachL1

Thanks for your interest in our work. As we addressed in the Section 4.1 of our paper, we use gradient accumulation to account for the memory pressure during training:

Training our method takes 18K iterations using a batch size of 32. To fit one GPU, we accumulate gradients for 16 steps.

@markkua Do you mean you are using 2 batch size data in one loop, forward->calculate loss->backward; then after 16 loops the weights are updated, so the actual batch size is 216=32. right? Or do you use 32 batch size data in one loop, the actual batch size is 3216=512? Hope you can clarify this, thanks.

The effective batchsize is 32, i.e. 2*16 = 32.

markkua avatar Jan 29 '24 10:01 markkua

Hi, I still encounter OOM error on 4090 24G with image resolution 480x640 and batch size 2. So I am wondering if the authors used fp16 version of Stable Diffusion. Can you clarify? @markkua Thank You.

songlin avatar Mar 25 '24 09:03 songlin

Hi, we didn't use fp16 during training.

markkua avatar Mar 25 '24 12:03 markkua

Ok. Thanks. Could you let us know that image resolution did you use for training ? And which version of Stable Diffusion.

Best, Songlin Wei


发件人: Bingxin Ke @.> 发送时间: Monday, March 25, 2024 8:26:33 PM 收件人: prs-eth/Marigold @.> 抄送: Songlin Wei @.>; Comment @.> 主题: Re: [prs-eth/Marigold] Out of memory when training with RTX4090, seeking guidance on training details (Issue #26)

Hi, we didn't use fp16 during training.

― Reply to this email directly, view it on GitHubhttps://github.com/prs-eth/Marigold/issues/26#issuecomment-2017889836, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGQINNRKN5GKVCJEHA74GTY2AJXTAVCNFSM6AAAAABA6SA4WGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJXHA4DSOBTGY. You are receiving this because you commented.Message ID: @.***>

songlin avatar Mar 25 '24 12:03 songlin

We used SD v2 . For resolution, please refer to Sec 4.2 for details.

markkua avatar Mar 25 '24 12:03 markkua

Thanks for the reply. According to the paper, the image resolution is 640x480, But I really cannot fit batch size=2 into a 4090 24G graphic card. Anything could go wrong? Any help is appreciated.

songlin avatar Mar 25 '24 12:03 songlin

Please check if this helps.

markkua avatar Mar 26 '24 19:03 markkua

Thank you for the information. But I still cant make it work even with xformers. I have to adjust the resolution to 240x320.

songlin avatar Mar 27 '24 11:03 songlin