ppl.pmx
ppl.pmx copied to clipboard
support int4 weight only quant for llama3
Support int4 weight only quantization for Llama3
- Define the weight only layer in ModelParallel.py
- Define ConvertWeightToOpmx.py and add quant here
- Update Dynamic Static modeling for quant model
- Fix the Woqu function for packed model
- Add weight only quant README.md
- Test export and demo under static and dynamic batching on A100 GPU
Some WIP:
- Support AutoAWQ when ConvertWeightToOpmx
@Jzz24 hi, I finish this PR. Can you review it?
@Jzz24 hi, I finish this PR. Can you review it?
OK
用naive 量化实现的Llama3模型导出,后续补上awq的。讨论后决定暂不做SplitModel 和 MergeModel
上个PR没合,我就在那个基础上继续写了。代码太多可能review起来有些困难QAQ
整体的逻辑是这样的,我把量化的步骤定义在了ConvertWeightToOpmx.py里,这个文件完成量化的工作。之后在Modeling里我适配了量化。量化层的定义也写在了ModelParallel.py里。