IF icon indicating copy to clipboard operation
IF copied to clipboard

vram requirements

Open kanttouchthis opened this issue 2 years ago • 6 comments

the readme lists a minimum of 16GB of vram without the stable-x4 upscaler, 24GB with, however you can run it with the stable-x4 on as little as 6GB of vram using sequential offload on the first stage/text encoder (in fp16) and cpu offload on the second/third stage. you can also run all three stages using cpu offload on 16GB (maybe less). you do need sufficient dram though.

  stage_1 = IFPipeline.from_pretrained(
      "DeepFloyd/IF-I-XL-v1.0",
      variant="fp16",
      torch_dtype=torch.float16,
  )
  stage_2 = IFSuperResolutionPipeline.from_pretrained(
      "DeepFloyd/IF-II-L-v1.0",
      text_encoder=None,
      variant="fp16",
      torch_dtype=torch.float16,
  )
  stage_3 = DiffusionPipeline.from_pretrained(
      "stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16
  )
#16 GB
stage_1.enable_model_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()
#6 GB
stage_1.enable_sequential_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()

i tested this on pytorch2.0.0+cu118 with torch.cuda.set_per_process_memory_fraction() to limit the amount of vram torch can use. the sequential offload significantly slows down the first stage, but that's better than not being able to run it at all

kanttouchthis avatar Apr 30 '23 22:04 kanttouchthis

I bought PC components two days ago (with the plan of going for SD, now that this is out...) and now that the minimum requirement grew to 16 i regret not going for intel arc, but sticking to the rtx 3060 😂

You are a life saviour. I will surely try this out when the components arrive!

Anatoly03 avatar May 01 '23 08:05 Anatoly03

see https://github.com/deep-floyd/IF/pull/61

neonsecret avatar May 01 '23 09:05 neonsecret

@kanttouchthis; What is inference speed like when running it this way (and what are the hardware specs)?

tildebyte avatar May 05 '23 02:05 tildebyte

This didn't work on an RTX 4080 with 16GB of VRAM.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB. GPU 0 has a total capacty of 15.99 GiB of which 10.82 GiB is free. Of the allocated memory 2.11 GiB is allocated by PyTorch, and 729.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

trimad avatar May 05 '23 13:05 trimad

I know that this will probably sound a certain way, but: is this even English? Personally, I'm sick of Torch's horrible technical writing...

torch.cuda.set_per_process_memory_fraction

Set memory fraction for a process. The fraction is used to limit an caching allocator to allocated memory on a CUDA device. The allowed value equals the total visible memory multiplied fraction. If trying to allocate more than the allowed value in a process, will raise an out of memory error in allocator.

tildebyte avatar May 05 '23 13:05 tildebyte

id love to see a full script and not some random snippets.....

IIIIIIIllllllllIIIII avatar May 06 '23 13:05 IIIIIIIllllllllIIIII