AnimateDiff
AnimateDiff copied to clipboard
Available for low VRAM?
For example, 8gb common graphic cards?
i think it's 12 or a bit more
on mine it uses between 13.5GB (when doing the first gif) and 18.3GB (when it does the 2nd gif, for when you have it set to try both models on each). I use a Tesla M40 card which is pre RTX though (which might mean slightly more VRAM usage compared to newer card architectures?) Could probably make it just take up 13.5GB by having it just do the 1.4 model and not the 1.5 motion model (1.4 usually creates better movement anyways), so that it could fit into a 16GB card at least. Best experience is on a 24GB card though. If doing it via script, can comment out the 1.5 model in the .yaml file.
Note that my xformers might not be working, I'm not fully sure how to check in the logs if xformers is actually being used in animatediff, or if I have it misconfigured, so that could explain partially why I see higher usage that the the 12GB that the github page says. Also, I use the TalesofAI version as I utilize the Initial Image feature, so that could also be a slight factor.
I was able to run this using 8GB VRAM card. So go to scripts/animate.py change line 105
from
pipeline.to("cuda")
to
pipeline.enable_sequential_cpu_offload()
it's running at 6.34s/it on my RTX 2080 S. It's slow, but its working.
And need to install accelerate.
it's running at 6.34s/it on my RTX 2080 S. It's slow, but its working.
i've tried on 1080 and it's around 15s/it 🥴
i wasn't expecting so big difference:
https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+GTX+1080&id=3502 https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+RTX+2080+SUPER&id=4123
did you applied any other optimizations? what's your accelerate yml config?
I was using defualt accelerate config:
{
"compute_environment": "LOCAL_MACHINE",
"distributed_type": "NO",
"downcast_bf16": false,
"machine_rank": 0,
"main_training_function": "main",
"mixed_precision": "no",
"num_machines": 1,
"num_processes": 1,
"rdzv_backend": "static",
"same_network": false,
"tpu_use_cluster": false,
"tpu_use_sudo": false,
"use_cpu": false
}
pre-compiled version of torch and torchvision
I find if pytorch is compiled from source, it runs a bit faster on your local hardware. So might want to try that.
i've tried on 1080 and it's around 15s/it 🥴
It also has to do with CPU and RAM speed, since doing enable_sequential_cpu_offload() requires offloading the model to CPU first and run it on GPU during actual inference.
thanks, i'll try tonight with your config to see if there is any difference at all
i was trying to optimize things in my config, but mb i made things only worse
UPD:
ah, and memory: 8x8GiB DDR3 1600 MHz cpu: 2xE5-2680 3500Mhz
i think cpu might be the bottleneck - because on other machine with newer cpu things are faster, however maximum what i could put on this mobo is E5-2687W v1 (which only marginally faster than 2680) so i'm not really sure it worth it
i've tried default config (as you posted above) and tried few other tweaks than i had before - and it still staying around 15s/it, so i think it's hardware
With a111 extension it works with resolutions below 256x256 on my rx 6600 otherwise always memory problem
@patientx even with this workaround?
couldnt find where to change it in a111 extension there is not a value there like that
Is this compatible with 4GB 3050 Ti card?
@VimukthiRandika1997 you'll have to lower resolution, try smth like 128x128 or 256x256 and gradually increase/decrease to fit the vram size, and make sure to apply a workaround above
@actionless Thank you for the info. I will try those apporaches!
I was able to run this using 8GB VRAM card. So go to
scripts/animate.pychange line 105from
pipeline.to("cuda")to
pipeline.enable_sequential_cpu_offload()it's running at
6.34s/iton my RTX 2080 S. It's slow, but its working.And need to install
accelerate.
However I did not find pipline.to("cuda") in animate.py and my 105 line is controlnet_images = torch.stack(controlnet_images).unsqueeze(0).cuda()