⚡ Request for SVDQuant Checkpoints Support & Development of FLUX fn4 with Nunchaku Technology
Description:
This issue proposes adding support for SVDQuant checkpoints and developing a custom FLUX fn4 checkpoint using Nunchaku’s SVDQuant, a 4-bit quantization technique. SVDQuant reduces memory and increases efficiency by absorbing outliers in model weights via low-rank components, providing a robust solution for memory-constrained environments without degrading performance.
Justification:
SVDQuant uniquely addresses the challenge of memory and latency in large models:
- Efficient Memory Management: By absorbing weight outliers, SVDQuant reduces memory usage by up to 3.5× in models like FLUX.1, achieving 3× speedup over weight-only quantized models.
- Maintained Quality: Visual fidelity is preserved, and SVDQuant matches 16-bit model quality, even with 4-bit quantization, making it ideal for high-performance applications.
Implementation Steps:
- Load SVDQuant Checkpoints: Integrate support for the loading of SVDQuant-formatted checkpoints.
- Develop FLUX fn4 Checkpoint: Train and validate FLUX fn4 using SVDQuant’s quantization, with benchmarks against non-quantized models to ensure quality retention.
- Optimize Performance: Use Nunchaku’s kernel fusion to minimize data movement and reduce latency by combining low-rank and low-bit processing.
References:
pls+1
Any news about this?
Please Please
Since November last year and no maintainer has even read the issue.