llm-awq
llm-awq copied to clipboard
Open-Flamingo reference
In the paper you said the following. How to do quantization for Open-Flamingo?
Thanks to better generalization, it also achieves good quantization
performance for instruction-tuned LMs (e.g., Vicuna) and, for the first time, multi-modal LMs (Open-Flamingo [2]). Thanks to our efficient kernels, AWQ achieves 1.45× and 2× speedup over GPTQ
and GPTQ with reordering on A100.