AutoAWQ
AutoAWQ copied to clipboard
support minicpm3.0
This time, no files were added, and the original class was inherited. It can be quantified according to the most basic method of the original Autoawq.
Hello,the following are the results of the perplexity test:
pretrained model: minicpm3 gpu usage: 8.67GB Perplexity 7.522: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 170/170 [00:30<00:00, 5.58it/s] awq model: minicpm3 gpu usage: 3.29GB Perplexity 8.195: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 164/164
@LDLINGLINGLING Sorry for taking so long. I simplified the modeling and added your custom quantizer to the docs. We now use Triton kernels which work with smaller models like MiniCPM3 4B out of the box, so there are no more CUDA issues.