AutoAWQ support minicpm3.0

support minicpm3.0

Open LDLINGLINGLING opened this issue 1 year ago • 1 comments

This time, no files were added, and the original class was inherited. It can be quantified according to the most basic method of the original Autoawq.

Sep 06 '24 11:09 LDLINGLINGLING

Hello,the following are the results of the perplexity test：

pretrained model： minicpm3 gpu usage: 8.67GB Perplexity 7.522: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 170/170 [00:30<00:00, 5.58it/s] awq model: minicpm3 gpu usage: 3.29GB Perplexity 8.195: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 164/164

Sep 06 '24 11:09 LDLINGLINGLING

@LDLINGLINGLING Sorry for taking so long. I simplified the modeling and added your custom quantizer to the docs. We now use Triton kernels which work with smaller models like MiniCPM3 4B out of the box, so there are no more CUDA issues.

Nov 14 '24 07:11 casper-hansen

AutoAWQ AutoAWQ copied to clipboard

support minicpm3.0

AutoAWQ
AutoAWQ copied to clipboard