SDNQ support

Open kanttouchthis opened this issue 3 weeks ago • 2 comments

Feature Idea

SDNQ provides low bit quantization with good quality and performance. Incorporating it for on-the-fly quantization and loading pre-quantized models would be great, especially for larger models like flux.2 where fp8 is too large, even for 24GB gpus. I have tried writing a custom node for this, but failed due to the model and vram management getting in the way. Compared to nunchaku this approach doesn't depend on a custom model implementation from another dev team for each new model.

Existing Solutions

ComfyUI-SDNQ is a vibe coded custom node that doesn't actually work

Other

No response

Dec 06 '25 20:12 kanttouchthis

SDNQ Models in the diffusers format for reference.

Dec 07 '25 01:12 kanttouchthis

Performance on RTX 3090 24GB & 64GB DRAM. SDNQ is using quantized_matmul=True and torch.compile with inductor backend. No torch.compile on ComfyUI due to OOM.

Method	Speed	VRAM
SDNQ uint4 flashattn	2.1 s/it	22 GB
ComfyUI fp8mixed sageattn	4.6 s/it	36 GB

Dec 07 '25 13:12 kanttouchthis