mlx-vlm
mlx-vlm copied to clipboard
Add AWQ/DWQ for Vision Models
Investigate and implement Activation-aware Weight Quantization (AWQ) and Dynamic Weight Quantization (DWQ) techniques specifically for vision models. Motivation: Vision models often have larger parameter counts and compute requirements. Effective quantization techniques like AWQ and DWQ could significantly reduce the compute and memory footprint while maintaining acceptable quality, particularly important for multimodal applications.