ComfyUI '--force-fp16' WORKS on GTX 1660 Super, but only SOMETIMES and with a tweak

GTX16 series is known to have problems with it's FP16 support as shown here pytorch/pytorch#77955 and here huggingface/diffusers#2153, and it makes running ComfyUI in half precision "impossible" (AUTOMATIC1111 won't even work without --no-half), but I finally managed to fix it on my 1660 Super.

My steps:

Adding torch.backends.cudnn.enabled = True and torch.backends.cudnn.benchmark = True to nodes.py after line 32 (just below imports) to enable and optimize cuDNN. Doesn't work without it.
Starting ComfyUI with --preview-method latent2rgb. Couldn't make it work without preview on.

I tested each step by running ~8 times with or without it.

Env and context:

Args: --use-pytorch-cross-attention --force-fp16
torch: Nightly with CUDA 12.1
Running ComfyUI in WSL2
Used model: SD2.1 based checkpoint with lora
DDIM sampler and scheduler
512x512 outputs

Sometimes it doesn't work and shows black image, as if without fix:

DDIM Sampler: 100%|███████████████████████████████████████████████| 20/20 [00:58<00:00,  2.93s/it]
/home/<user>/ComfyUI/nodes.py:1143: RuntimeWarning: invalid value encountered in cast
  img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))

Output images are almost the same as without --force-fp16 but with VRAM usage nearly halved as expected. Surprisingly, FP16 mode makes inference 4-5 times slower. It takes a few restarts of ComfyUI to not output black images with my fix.

I hope it helps.

Jul 11 '23 23:07 SERGEYDJUM

Yes fp16 is a lot slower. That's the reason why ComfyUI always uses fp32 on those cards.

Jul 11 '23 23:07 comfyanonymous

Yes fp16 is a lot slower. That's the reason why ComfyUI always uses fp32 on those cards.

What if someone needs to run a big model? I would prefer to have an option even if it's slow

Jul 11 '23 23:07 SERGEYDJUM

Hello, may I ask if a 1660 6GB graphics card can use ComfyUI? What is the resolution of the image that can be generated? Can SDxl0.9 be used on a 6GB graphics card? Thank you very much. I have a 1660 6GB graphics card and currently can only use SD

Jul 12 '23 09:07 wzgrx

@wzgrx I have the 1660 Super and I can generate SD 512x512 images in under 20 seconds. There is a lot of misinformation and suboptimal configurations for SD regarding these particular 16XX cards, so check out this post to configure SD webui for GTX 1660.

The implementation of FP16 in these cards is different and pytorch does not seem to understand/handle it properly yet, resulting in +VRAM usage and degraded performance while the card is actually supposed to handle FP16 operations much faster. Anyway, I'm very satisfied with the card's perfomance with SD, specially considering that I bought a used one for a fraction of the price.

Jul 12 '23 14:07 andreszs

@wzgrx I have the 1660 Super and I can generate SD 512x512 images in under 20 seconds. There is a lot of misinformation and suboptimal configurations for SD regarding these particular 16XX cards, so check out this post to configure SD webui for GTX 1660.

The implementation of FP16 in these cards is different and pytorch does not seem to understand/handle it properly yet, resulting in +VRAM usage and degraded performance while the card is actually supposed to handle FP16 operations much faster. Anyway, I'm very satisfied with the card's perfomance with SD, specially considering that I bought a used one for a fraction of the price.

Okay, thank you for the answer

Jul 12 '23 16:07 wzgrx

ComfyUI ComfyUI copied to clipboard

'--force-fp16' WORKS on GTX 1660 Super, but only SOMETIMES and with a tweak

ComfyUI
ComfyUI copied to clipboard