stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

Can't get good quality wan 2.1 videos

Open KintCark opened this issue 2 months ago • 9 comments

What is the best setting for wan 2.1 I can't get animations to good visual it's just mush and artifacts I even tried causvid but that didn't work either idk how to use wan 2.1 it's the only one that works on my limited ram

KintCark avatar Oct 15 '25 03:10 KintCark

What are your system specs? CPU/System RAM/GPU +VRAM?

MrSnichovitch avatar Oct 16 '25 22:10 MrSnichovitch

What are your system specs? CPU/System RAM/GPU +VRAM?

Cpu snapdragon 865

Ram 10602MB

Gpu adreno 650

Vram N/A

KintCark avatar Oct 21 '25 01:10 KintCark

Can you supply a sample prompt you're attempting to use? If you're not using the --diffusion-fa option, it might be necessary as it seems to be with Vulkan and ROCm, but I can't specifically say for sure since I don't have an ARM-based/Adreno GPU system to test with.

Here's an example prompt I've run on Vulkan using the standard WAN2.1 T2V model & VAE, but with the Q5_K_M text encoder to save RAM. Process used ~500 MiB of system RAM when running, but I'll warn you that the VAE process used up ~ 5.4 GiB of VRAM at peak, so limited memory could still pose a problem.

./sd -M vid_gen 
--diffusion-model models/checkpoints/wan2.1_t2v_1.3B_fp16.safetensors 
--vae models/vae/wan_2.1_vae.safetensors 
--t5xxl models/text_encoders/umt5-xxl-encoder-Q5_K_M.gguf 
--lora-model-dir models/loras --embd-dir models/embeddings 
-s -1 --cfg-scale 5 --steps 20 
--sampling-method euler --scheduler simple 
-W 384 -H 216 
--fps 12 --video-frames 85 
-v 
-p "medium shot, a lovely cat in a carpeted room, turning to face the camera and walking toward it" 
-n "anime, cartoon, drawing, 3d render, cgi render, ai generated, ugly face" 
-o ../VidGenOutputs/wan2.1_t2v_1.3B_fp16_2025-10-20_test02.avi --diffusion-fa

Resulting vid:

https://github.com/user-attachments/assets/68f9da8f-ae18-4266-8629-95122f5e600b

MrSnichovitch avatar Oct 21 '25 02:10 MrSnichovitch

I mean I don't think you should expect amazing quality out of the 1.3B model

stduhpf avatar Oct 21 '25 09:10 stduhpf

@stduhpf You're very right... you shouldn't expect miracles. But you should at least be able to get viable clips from it, which is what my particular example clip did. It followed the prompt and produced what was intended.

MrSnichovitch avatar Oct 21 '25 18:10 MrSnichovitch

@stduhpf You're very right... you shouldn't expect miracles. But you should at least be able to get viable clips from it, which is what my particular example clip did. It followed the prompt and produced what was intended.

My problem is I don't want to wait for so long just to get a unfinished video my frame rate is too low u did 85 frames and I forgot how many steps but yeah we need fastwan support it only requires 3 steps so we wouldn't need to wait so long

KintCark avatar Oct 22 '25 22:10 KintCark

Can you supply a sample prompt you're attempting to use? If you're not using the --diffusion-fa option, it might be necessary as it seems to be with Vulkan and ROCm, but I can't specifically say for sure since I don't have an ARM-based/Adreno GPU system to test with.

Here's an example prompt I've run on Vulkan using the standard WAN2.1 T2V model & VAE, but with the Q5_K_M text encoder to save RAM. Process used ~500 MiB of system RAM when running, but I'll warn you that the VAE process used up ~ 5.4 GiB of VRAM at peak, so limited memory could still pose a problem.

./sd -M vid_gen 
--diffusion-model models/checkpoints/wan2.1_t2v_1.3B_fp16.safetensors 
--vae models/vae/wan_2.1_vae.safetensors 
--t5xxl models/text_encoders/umt5-xxl-encoder-Q5_K_M.gguf 
--lora-model-dir models/loras --embd-dir models/embeddings 
-s -1 --cfg-scale 5 --steps 20 
--sampling-method euler --scheduler simple 
-W 384 -H 216 
--fps 12 --video-frames 85 
-v 
-p "medium shot, a lovely cat in a carpeted room, turning to face the camera and walking toward it" 
-n "anime, cartoon, drawing, 3d render, cgi render, ai generated, ugly face" 
-o ../VidGenOutputs/wan2.1_t2v_1.3B_fp16_2025-10-20_test02.avi --diffusion-fa

Resulting vid:

https://github.com/user-attachments/assets/68f9da8f-ae18-4266-8629-95122f5e600b

How you able to use the wan 2.1 1.3B fp16 model with utm5xxl q5_0 I have 10gb ram with 8gb free u tried wab 2.1 1 3B Q8 and umt5 Q5 but it got oom and termux crashed

KintCark avatar Nov 08 '25 03:11 KintCark

How you able to use the wan 2.1 1.3B fp16 model with utm5xxl q5_0 I have 10gb ram with 8gb free u tried wab 2.1 1 3B Q8 and umt5 Q5 but it got oom and termux crashed

I mentioned previously that I don't have an ARM-based/Adreno GPU system to test with, so I'm using desktop hardware with more RAM and a GPU with dedicated VRAM to work with. There's no real way for me to give you an apples-to-apples comparison using the Linux system I have vs. the Android system you're using.

When stable-diffusion.cpp runs, it loads/buffers the model, VAE, and text encoder tensors into RAM before processing begins in earnest, meaning that even if you were to use the smallest quants available, such as these:

wan2.1_t2v_1.3b-q2_k.gguf		552.4 MiB
wan_2.1_vae.safetensors			242.1 MiB
umt5-xxl-encoder-Q3_K_M.gguf	2.8 GiB
					            3.62 GiB total

...you'd still be eating up 3.62 GiB of your available 8 GiB RAM before the compute stages start, which consumes RAM/VRAM on top of that.

I've tested those small quants on my system using both the Vulkan and ROCm backends, and I can't get them to produce anything besides blurry shapes. It should be noted that WAN was coded to use CUDA on NVidia hardware, and with it's poor performance in Vulkan and ROCm, it definitely shows. I have no idea if the small quants function properly on NVidia since I have no hardware to test with.

I don't want to be discouraging, but ultimately, you may be tilting at windmills trying to get WAN to generate anything usable with the limited resources you have. Even the full fp16 version of WAN2.1 1.3B doesn't run under Vulkan all that well on my system (ROCm is consistently better), and I still need the fp16 version to get anything usable. I'm afraid there's no information I can give you that might help.

MrSnichovitch avatar Nov 08 '25 20:11 MrSnichovitch

How you able to use the wan 2.1 1.3B fp16 model with utm5xxl q5_0 I have 10gb ram with 8gb free u tried wab 2.1 1 3B Q8 and umt5 Q5 but it got oom and termux crashed

I mentioned previously that I don't have an ARM-based/Adreno GPU system to test with, so I'm using desktop hardware with more RAM and a GPU with dedicated VRAM to work with. There's no real way for me to give you an apples-to-apples comparison using the Linux system I have vs. the Android system you're using.

When stable-diffusion.cpp runs, it loads/buffers the model, VAE, and text encoder tensors into RAM before processing begins in earnest, meaning that even if you were to use the smallest quants available, such as these:

wan2.1_t2v_1.3b-q2_k.gguf		552.4 MiB
wan_2.1_vae.safetensors			242.1 MiB
umt5-xxl-encoder-Q3_K_M.gguf	2.8 GiB
					            3.62 GiB total

...you'd still be eating up 3.62 GiB of your available 8 GiB RAM before the compute stages start, which consumes RAM/VRAM on top of that.

I've tested those small quants on my system using both the Vulkan and ROCm backends, and I can't get them to produce anything besides blurry shapes. It should be noted that WAN was coded to use CUDA on NVidia hardware, and with it's poor performance in Vulkan and ROCm, it definitely shows. I have no idea if the small quants function properly on NVidia since I have no hardware to test with.

I don't want to be discouraging, but ultimately, you may be tilting at windmills trying to get WAN to generate anything usable with the limited resources you have. Even the full fp16 version of WAN2.1 1.3B doesn't run under Vulkan all that well on my system (ROCm is consistently better), and I still need the fp16 version to get anything usable. I'm afraid there's no information I can give you that might help.

Well I managed to get wan Q5KS umt5-xxl-encoder-Q4_K_S.gguf wan fp8 but i can't manage to get results it only worked once. I try using self forcing causvid and cfg distilled loras but still can't get good results.

KintCark avatar Nov 09 '25 23:11 KintCark