[Question/Improvement] Specify the quantization method applied by llmexport.py and MNNConvert
Having a clear understanding of the quantization method to be applied can be crucial when it comes to choosing the correct Quantization-Aware Trained (QAT) model. Unfortunately, the current docs for llmexport.py and MNNConvert is not explicit about what method is used.
Personally, as a novice, I'm lost in choosing between gemma-3-12b-qat-int4-unquantized and gemma-3-12b-qat-q4_0-unquantized, since I can't determine which one is the most preferable for conversion to MNN. So any tips are welcome.
Well, regardless of the Gemma 3 version chosen (including non-QAT one), llmexport.py responds with AttributeError: 'NoneType' object has no attribute 'weight'.
Well, regardless of the Gemma 3 version chosen (including non-QAT one), llmexport.py responds with
AttributeError: 'NoneType' object has no attribute 'weight'.
Which kind of gemma 3 model are you convert? For Gemma 3 4b and 1 b we test is ok. Do you update mnn to 3.2.0?
See https://mnn-docs.readthedocs.io/en/latest/transformers/llm.html , or execute python3 llmexport -h you can see the quant option.
--awq Whether or not to use awq quant.
--sym Whether or not to using symmetric quant (without zeropoint), defualt is False.
--quant_bit QUANT_BIT
mnn quant bit, 4 or 8, default is 4.
--quant_block QUANT_BLOCK
mnn quant block, 0 mean channle-wise, default is 128.
--lm_quant_bit LM_QUANT_BIT
mnn lm_head quant bit, 4 or 8, default is `quant_bit`.
mnn's quant has much more degree of freedom than llama.cpp. Normally weight quant block=64 has the same precision to Q4_1. While mnn's 32 block has higher precision than Q4_1 and Q4_0
Thanks for the response!
See https://mnn-docs.readthedocs.io/en/latest/transformers/llm.html , or execute python3 llmexport -h you can see the quant option.
I meant some definite guideline to not make mistake when choosing between QAT models trained for int4 or Q4, or maybe to clearly understand if QAT is not suitable for MNN at all. At first I assumed I should choose int4 version, but after your reply I started leaning towards the vanilla one. (Still ambiguous, as you can see.)
Which kind of gemma 3 model are you convert? For Gemma 3 4b and 1 b we test is ok. Do you update mnn to 3.2.0?
12B that are linked above. I have tested 4B, but it's still too weak. Need more.
mnn's quant has much more degree of freedom than llama.cpp. Normally weight quant block=64 has the same precision to Q4_1. While mnn's 32 block has higher precision than Q4_1 and Q4_0
Nice table. I relied on block=0 before, but I want to try 32 next time (hopefully for better quality).
Well, regardless of the Gemma 3 version chosen (including non-QAT one), llmexport.py responds with
AttributeError: 'NoneType' object has no attribute 'weight'.
I got same error on Qwen2.5-VL-7B-Instruct, even under the latest code repo and MNN==3.2.0. but it can work on Qwen2.5-0.5B-Instruct which is not a visual LLM.
so, is MLLM not supported? @jxt1234
it works after I change the value of 'lm_' of self.model_map['model']: 'model.lm_head' -> 'lm_head' (just like that value of Qwen2.5), then I pass the left converting and quanting. I dont know why?
Marking as stale. No activity in 60 days.