Dreambooth-Stable-Diffusion Any tip to make it run on A100 (I can only run on A6000 so far)?

Any tip to make it run on A100 (I can only run on A6000 so far)?

Open asrlhhh opened this issue 2 years ago • 6 comments

Hi, thanks for providing the codes and it's been helpful so far. One thing I have in mind is that, if there is any tips to make it work on A100. I know somewhere there is already a discussion about the memory usage and it's said that the training pipeline uses 35+ GB. I tried it on a 8 A100 instances last night and it still gives out of memory issues. Running on a single A6000 works though as A6000 has slightly more GPU memory than A100. It's just 48GB vs 40GB though...

Sep 11 '22 20:09 asrlhhh

I was able to run it on V100 with 32gb ram. There might be some other issue.

Sep 11 '22 21:09 sausax

please post your training config

Sep 12 '22 23:09 viperyl

Nothing special in training config. Used the same arguments as mentioned in readme.

Sep 13 '22 21:09 sausax

I also cannot run the project on 32G V100 when I create the environment with environment.yaml, which uses pytroch1.10.2. However, when I update the pytorch to 1.12.1, I can run it successfully.

Sep 21 '22 04:09 csyxwei

One another trick to reduce memory. This code is based on Textual Inversion, and TI does something here (https://github.com/rinongal/textual_inversion/blob/main/ldm/modules/diffusionmodules/util.py#L112), which disable gradient checkpointing in a hard-code way. This is because in TI, the Unet is not optimized. However, here we optimize the Unet, so we can turn on the gradient checkpoint point trick, as in the original SD repo (here https://github.com/CompVis/stable-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L112). The gradient checkpoint is default to be True in config (here https://github.com/XavierXiao/Dreambooth-Stable-Diffusion/blob/main/configs/stable-diffusion/v1-finetune_unfrozen.yaml#L47)

Sep 21 '22 04:09 XavierXiao

Nice trick, it reduces the memory from 31G to 27G.

Sep 21 '22 05:09 csyxwei

Dreambooth-Stable-Diffusion Dreambooth-Stable-Diffusion copied to clipboard

Any tip to make it run on A100 (I can only run on A6000 so far)?

Dreambooth-Stable-Diffusion
Dreambooth-Stable-Diffusion copied to clipboard