ehuaa
ehuaa
i fixed this problem by pull the latest pr you committed last week, thanks!
@moseshu Have you figure this problem out?
@lvhan028 @pppppM 还有一个问题,就是目前calib_dataloader.py中对于不同type的model,送入tokenizer的校准数据的格式都是一样的。 比如 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/lite/utils/calib_dataloader.py#L28-L29 对于比如Qwen2和Llama2两类不同指令格式的llm,tokenize的数据是一样的,想问下这里可不可以送入校准数据的时候根据chat template对校准数据做封装呢,比如对Qwen2添加User 等字符,是不是会提高量化的精确率,谢谢
https://github.com/QwenLM/Qwen/issues/657#issuecomment-1820628134 比如在这里Qwen官方所述(虽然是GPTQ),量化校准的数据格式最好和finetune的格式匹配,即chat template 还有calib_seqlen这里默认截断2048的话,对qwen72b来说的话是不是2k到32k最大长度之间的校准数据就没法输入了
+1 I have met the same problem with the accuracy drop after quantization, how can i debug with it or how to choose the calibration dataset? @Tracin @byshiue
> For reference, the Mistral model degrades in performance over time just like dense attention methods:  Here, `attention_sinks` refers to the StreamingLLM approach, `transformers` is their model used via...
> > when the input length grows to 8k, it failed. Is this right? > > That's right. Although the model doesn't crash until 128k, it doesn't perform well once...
> Correct, not for `mistralai/Mistral-7B-v0.1`, at least. There are some Mistral-based models that work on longer sequence lengths, e.g.: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k Thanks tom, i'll check the url later!
> Correct, not for `mistralai/Mistral-7B-v0.1`, at least. There are some Mistral-based models that work on longer sequence lengths, e.g.: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k Hi @tomaarsen , i have another problem here. In your...
@pseudotensor how do you solve this? Is this a hardware problem?