AnimateDiff icon indicating copy to clipboard operation
AnimateDiff copied to clipboard

Available for low VRAM?

Open lsycxyj opened this issue 2 years ago • 14 comments

For example, 8gb common graphic cards?

lsycxyj avatar Jul 26 '23 08:07 lsycxyj

i think it's 12 or a bit more

actionless avatar Jul 29 '23 01:07 actionless

on mine it uses between 13.5GB (when doing the first gif) and 18.3GB (when it does the 2nd gif, for when you have it set to try both models on each). I use a Tesla M40 card which is pre RTX though (which might mean slightly more VRAM usage compared to newer card architectures?) Could probably make it just take up 13.5GB by having it just do the 1.4 model and not the 1.5 motion model (1.4 usually creates better movement anyways), so that it could fit into a 16GB card at least. Best experience is on a 24GB card though. If doing it via script, can comment out the 1.5 model in the .yaml file.

Note that my xformers might not be working, I'm not fully sure how to check in the logs if xformers is actually being used in animatediff, or if I have it misconfigured, so that could explain partially why I see higher usage that the the 12GB that the github page says. Also, I use the TalesofAI version as I utilize the Initial Image feature, so that could also be a slight factor.

Bendito999 avatar Jul 29 '23 05:07 Bendito999

I was able to run this using 8GB VRAM card. So go to scripts/animate.py change line 105

from

pipeline.to("cuda")

to

pipeline.enable_sequential_cpu_offload()

it's running at 6.34s/it on my RTX 2080 S. It's slow, but its working.

And need to install accelerate.

kiddos avatar Jul 29 '23 17:07 kiddos

it's running at 6.34s/it on my RTX 2080 S. It's slow, but its working.

i've tried on 1080 and it's around 15s/it 🥴

i wasn't expecting so big difference:

https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+GTX+1080&id=3502 https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+RTX+2080+SUPER&id=4123

did you applied any other optimizations? what's your accelerate yml config?

actionless avatar Jul 29 '23 17:07 actionless

I was using defualt accelerate config:

{
  "compute_environment": "LOCAL_MACHINE",
  "distributed_type": "NO",
  "downcast_bf16": false,
  "machine_rank": 0,
  "main_training_function": "main",
  "mixed_precision": "no",
  "num_machines": 1,
  "num_processes": 1,
  "rdzv_backend": "static",
  "same_network": false,
  "tpu_use_cluster": false,
  "tpu_use_sudo": false,
  "use_cpu": false
}

pre-compiled version of torch and torchvision

I find if pytorch is compiled from source, it runs a bit faster on your local hardware. So might want to try that.

i've tried on 1080 and it's around 15s/it 🥴

It also has to do with CPU and RAM speed, since doing enable_sequential_cpu_offload() requires offloading the model to CPU first and run it on GPU during actual inference.

kiddos avatar Jul 29 '23 18:07 kiddos

thanks, i'll try tonight with your config to see if there is any difference at all

i was trying to optimize things in my config, but mb i made things only worse

UPD:

ah, and memory: 8x8GiB DDR3 1600 MHz cpu: 2xE5-2680 3500Mhz

i think cpu might be the bottleneck - because on other machine with newer cpu things are faster, however maximum what i could put on this mobo is E5-2687W v1 (which only marginally faster than 2680) so i'm not really sure it worth it

actionless avatar Jul 29 '23 18:07 actionless

i've tried default config (as you posted above) and tried few other tweaks than i had before - and it still staying around 15s/it, so i think it's hardware

actionless avatar Jul 29 '23 20:07 actionless

With a111 extension it works with resolutions below 256x256 on my rx 6600 otherwise always memory problem

patientx avatar Aug 01 '23 14:08 patientx

@patientx even with this workaround?

actionless avatar Aug 01 '23 14:08 actionless

couldnt find where to change it in a111 extension there is not a value there like that

patientx avatar Aug 01 '23 23:08 patientx

Is this compatible with 4GB 3050 Ti card?

VimukthiRandika1997 avatar Dec 31 '23 00:12 VimukthiRandika1997

@VimukthiRandika1997 you'll have to lower resolution, try smth like 128x128 or 256x256 and gradually increase/decrease to fit the vram size, and make sure to apply a workaround above

actionless avatar Jan 01 '24 21:01 actionless

@actionless Thank you for the info. I will try those apporaches!

VimukthiRandika1997 avatar Jan 02 '24 07:01 VimukthiRandika1997

I was able to run this using 8GB VRAM card. So go to scripts/animate.py change line 105

from

pipeline.to("cuda")

to

pipeline.enable_sequential_cpu_offload()

it's running at 6.34s/it on my RTX 2080 S. It's slow, but its working.

And need to install accelerate.

However I did not find pipline.to("cuda") in animate.py and my 105 line is controlnet_images = torch.stack(controlnet_images).unsqueeze(0).cuda()

traveling121 avatar Mar 20 '24 12:03 traveling121