SDNQ support
Feature Idea
SDNQ provides low bit quantization with good quality and performance. Incorporating it for on-the-fly quantization and loading pre-quantized models would be great, especially for larger models like flux.2 where fp8 is too large, even for 24GB gpus. I have tried writing a custom node for this, but failed due to the model and vram management getting in the way. Compared to nunchaku this approach doesn't depend on a custom model implementation from another dev team for each new model.
Existing Solutions
ComfyUI-SDNQ is a vibe coded custom node that doesn't actually work
Other
No response
SDNQ Models in the diffusers format for reference.
Performance on RTX 3090 24GB & 64GB DRAM. SDNQ is using quantized_matmul=True and torch.compile with inductor backend. No torch.compile on ComfyUI due to OOM.
| Method | Speed | VRAM |
|---|---|---|
| SDNQ uint4 flashattn | 2.1 s/it | 22 GB |
| ComfyUI fp8mixed sageattn | 4.6 s/it | 36 GB |