ComfyUI
ComfyUI copied to clipboard
'--force-fp16' WORKS on GTX 1660 Super, but only SOMETIMES and with a tweak
GTX16 series is known to have problems with it's FP16 support as shown here pytorch/pytorch#77955 and here huggingface/diffusers#2153, and it makes running ComfyUI in half precision "impossible" (AUTOMATIC1111 won't even work without --no-half), but I finally managed to fix it on my 1660 Super.
My steps:
- Adding
torch.backends.cudnn.enabled = Trueandtorch.backends.cudnn.benchmark = Truetonodes.pyafter line 32 (just below imports) to enable and optimize cuDNN. Doesn't work without it. - Starting ComfyUI with
--preview-method latent2rgb. Couldn't make it work without preview on.
I tested each step by running ~8 times with or without it.
Env and context:
- Args:
--use-pytorch-cross-attention --force-fp16 torch: Nightly with CUDA 12.1- Running ComfyUI in WSL2
- Used model: SD2.1 based checkpoint with lora
- DDIM sampler and scheduler
- 512x512 outputs
Sometimes it doesn't work and shows black image, as if without fix:
DDIM Sampler: 100%|███████████████████████████████████████████████| 20/20 [00:58<00:00, 2.93s/it]
/home/<user>/ComfyUI/nodes.py:1143: RuntimeWarning: invalid value encountered in cast
img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))
Output images are almost the same as without --force-fp16 but with VRAM usage nearly halved as expected. Surprisingly, FP16 mode makes inference 4-5 times slower. It takes a few restarts of ComfyUI to not output black images with my fix.
I hope it helps.
Yes fp16 is a lot slower. That's the reason why ComfyUI always uses fp32 on those cards.
Yes fp16 is a lot slower. That's the reason why ComfyUI always uses fp32 on those cards.
What if someone needs to run a big model? I would prefer to have an option even if it's slow
Hello, may I ask if a 1660 6GB graphics card can use ComfyUI? What is the resolution of the image that can be generated? Can SDxl0.9 be used on a 6GB graphics card? Thank you very much. I have a 1660 6GB graphics card and currently can only use SD
@wzgrx I have the 1660 Super and I can generate SD 512x512 images in under 20 seconds. There is a lot of misinformation and suboptimal configurations for SD regarding these particular 16XX cards, so check out this post to configure SD webui for GTX 1660.
The implementation of FP16 in these cards is different and pytorch does not seem to understand/handle it properly yet, resulting in +VRAM usage and degraded performance while the card is actually supposed to handle FP16 operations much faster. Anyway, I'm very satisfied with the card's perfomance with SD, specially considering that I bought a used one for a fraction of the price.
@wzgrx I have the 1660 Super and I can generate SD 512x512 images in under 20 seconds. There is a lot of misinformation and suboptimal configurations for SD regarding these particular 16XX cards, so check out this post to configure SD webui for GTX 1660.
The implementation of FP16 in these cards is different and pytorch does not seem to understand/handle it properly yet, resulting in +VRAM usage and degraded performance while the card is actually supposed to handle FP16 operations much faster. Anyway, I'm very satisfied with the card's perfomance with SD, specially considering that I bought a used one for a fraction of the price.
Okay, thank you for the answer