ControlNet OOM on 24gb GPU (4090) when running training tutorial

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.64 GiB total capacity; 15.74 GiB already allocated; 1.41 GiB free; 19.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Trying to run the training tutorial out of the box - how much VRAM is needed?

Mar 21 '23 14:03 whydna

Based on my experiments, I think you need at least 29G for running the tutorial

Mar 25 '23 01:03 jingyangcarl

set save_memory= true will support training on 16g vram

Mar 25 '23 05:03 lllyasviel

could you tell me where to set

Mar 26 '23 05:03 rcc-cubAC

config.py line:1

Mar 27 '23 09:03 dzhcool

Based on my experiments, I think you need at least 29G for running the tutorial

Thanks, This is something I am looking for.

Is there a way to compute GPU requirements for a given dataset and experiment?

May 01 '23 14:05 shravankumar147

could you tell me where to set

https://github.com/lllyasviel/ControlNet/blob/d3284fcd0972c510635a4f5abe2eeb71dc0de524/config.py#L1

May 01 '23 14:05 shravankumar147

set save_memory= true will support training on 16g vram

Is it possible to train controlnet with 11gb vram? @lllyasviel

May 03 '23 08:05 universewill

set save_memory= true will support training on 16g vram

Is it possible to train controlnet with 11gb vram? @lllyasviel

I also have 11gb varm, and set the size of training data at (128, 128), but still not work.... will this be helpful ? or just because I didn't set the size correctly?

at tutorial_dataset.py

def __getitem__(self, idx):
    item = self.data[idx]

    source_filename = item['source']
    target_filename = item['target']
    prompt = item['prompt']

    source = cv2.imread('./training/fill50k/' + source_filename)
    target = cv2.imread('./training/fill50k/' + target_filename)

    source = cv2.resize(source, (128, 128))   # ！！！！！！！！！
    target = cv2.resize(target, (128, 128))     # ！！！！！！！！！

    # Do not forget that OpenCV read images in BGR order.
    source = cv2.cvtColor(source, cv2.COLOR_BGR2RGB)
    target = cv2.cvtColor(target, cv2.COLOR_BGR2RGB)

    # Normalize source images to [0, 1].
    source = source.astype(np.float32) / 255.0

    # Normalize target images to [-1, 1].
    target = (target.astype(np.float32) / 127.5) - 1.0

    return dict(jpg=target, txt=prompt, hint=source)

May 22 '23 12:05 YuanSnowing

set save_memory= true will support training on 16g vram

Is it possible to train controlnet with 11gb vram? @lllyasviel

I also have 11gb varm, and set the size of training data at (128, 128), but still not work.... will this be helpful ? or just because I didn't set the size correctly?

at tutorial_dataset.py

when I continue to lower the size, I got an error :

Traceback (most recent call last):
  ........
  File "/kitti_gen/ControlNet/cldm/cldm.py", line 39, in forward
    h = torch.cat([h, hs.pop()], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list.

I wonder why would this happen?.. maybe it is because some sampling inconsistency...? making shapes not the same

even if I set the resolution at 64, save_memory=True, and only use middle layer, this model still end to OOM. is there anything else I can do?

May 22 '23 13:05 YuanSnowing

all duplicate concerning "RAM and out of memory exceptions (OOM)": https://github.com/lllyasviel/ControlNet/issues/21 https://github.com/lllyasviel/ControlNet/issues/33 https://github.com/lllyasviel/ControlNet/issues/191 https://github.com/lllyasviel/ControlNet/issues/236 https://github.com/lllyasviel/ControlNet/issues/241 https://github.com/lllyasviel/ControlNet/issues/247 https://github.com/lllyasviel/ControlNet/issues/294 https://github.com/lllyasviel/ControlNet/issues/301

Sep 17 '23 10:09 geroldmeisinger

@geroldmeisinger have you found a solution to this? Because as I mentioned elsewhere, commenting that there are duplicates just ends up ending a thread

Nov 07 '23 20:11 codeundercoverdev

my hope with pointing to the duplicates was to help others find every information which is available on one topic and at the same time focus everything on one "main"-thread. on the other hand, this is a issue section, not discussions, and there should only be one thread per issue.

have you found a solution to this?

you can try the diffusers training script which claims to run on 8GB using Linux and deepspeed (scroll all the way down). someone also asked for ControlNet-XS support and I also asked for ControlNet Würstchen support which may reduce training requirements, but so far this hasn't been implemented. if you know of any other "small" controlnets, please let us know!

Nov 08 '23 07:11 geroldmeisinger

ControlNet ControlNet copied to clipboard

OOM on 24gb GPU (4090) when running training tutorial

ControlNet
ControlNet copied to clipboard