yuanqian_zhao comments

Results 21 comments of


                                            yuanqian_zhao

How Can I Peft the Smoothquanted LLM?

same issue, is this available?

Add LoRA fine-tuning to AWQ

Hi! any progress? is train LoRA modules with AWQ available now?

Add LoRA fine-tuning to AWQ

> Hi, I'm also interested to know whether LoRA + AWQ is already available now. Thanks! @RicardoHalak see this, is runnable https://github.com/huggingface/transformers/pull/28987

请问现在支持Yi-34B的awq 4bit部署吗？

available now？I simply do gptq and awq on Yi-6B, and try to do lora training on it, however, loss is Nan.

[BUG] Quantitative model Yi-1.5-9b-16K does not produce text output.

@maxin9966 this may irrelevant to your questions, but I'm wondering in your code, for the chat model, why attention_mask is just `input_ids.ne(tokenizer.pad_token_id)`, maybe only calculating the response loss is better

Why losses are higher for GPTJ than LLama?

Loss decreases when increasing the number of samples simply because the value printed on your screen is the loss divided by the number of samples.

Question about W4A16 Benchmark

> Hi, for the W4A16 configuration, most mainstream quantization algorithms can achieve good results. If using a per-channel setting for W4A16, you can choose any quantization algorithm you prefer. However,...

Question on rotation

I tried applying the QoQ method on MiniCPM3-4B and tested the drop points for BBH/MMLU/Ceval/Cmmlu/Humanebval/Mbpp/Gsm8k/Math benchmarks. Ceval, Humaneval, Mbpp, and Gsm8k experienced about a 10 percentage point drop, while the...

Question on rotation

@HandH1998 Yes, it is rotation+gptq, with evaluation based on transformers+ultraeval. Specifically, I applied lmquant (w4a8kv4, groupsize=32, only adding all R1 type rotation matrices declared in spinquant, no smooth involved) to...