AutoAWQ icon indicating copy to clipboard operation
AutoAWQ copied to clipboard

support minicpm3.0

Open LDLINGLINGLING opened this issue 1 year ago • 1 comments

This time, no files were added, and the original class was inherited. It can be quantified according to the most basic method of the original Autoawq.

LDLINGLINGLING avatar Sep 06 '24 11:09 LDLINGLINGLING

Hello,the following are the results of the perplexity test:

pretrained model: minicpm3 gpu usage: 8.67GB Perplexity 7.522: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 170/170 [00:30<00:00, 5.58it/s] awq model: minicpm3 gpu usage: 3.29GB Perplexity 8.195: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 164/164

LDLINGLINGLING avatar Sep 06 '24 11:09 LDLINGLINGLING

@LDLINGLINGLING Sorry for taking so long. I simplified the modeling and added your custom quantizer to the docs. We now use Triton kernels which work with smaller models like MiniCPM3 4B out of the box, so there are no more CUDA issues.

casper-hansen avatar Nov 14 '24 07:11 casper-hansen