ehuaa

Results 29 comments of ehuaa

i fixed this problem by pull the latest pr you committed last week, thanks!

@moseshu Have you figure this problem out?

@lvhan028 @pppppM 还有一个问题,就是目前calib_dataloader.py中对于不同type的model,送入tokenizer的校准数据的格式都是一样的。 比如 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/lite/utils/calib_dataloader.py#L28-L29 对于比如Qwen2和Llama2两类不同指令格式的llm,tokenize的数据是一样的,想问下这里可不可以送入校准数据的时候根据chat template对校准数据做封装呢,比如对Qwen2添加User 等字符,是不是会提高量化的精确率,谢谢

https://github.com/QwenLM/Qwen/issues/657#issuecomment-1820628134 比如在这里Qwen官方所述(虽然是GPTQ),量化校准的数据格式最好和finetune的格式匹配,即chat template 还有calib_seqlen这里默认截断2048的话,对qwen72b来说的话是不是2k到32k最大长度之间的校准数据就没法输入了

+1 I have met the same problem with the accuracy drop after quantization, how can i debug with it or how to choose the calibration dataset? @Tracin @byshiue

> For reference, the Mistral model degrades in performance over time just like dense attention methods: ![272347418-3a4c5634-cc1b-42d1-a35a-afb376a4f970](https://private-user-images.githubusercontent.com/37621491/274509036-3042d2a3-bd02-40ee-b646-bf7ccc879e91.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDkxMTE0MDQsIm5iZiI6MTcwOTExMTEwNCwicGF0aCI6Ii8zNzYyMTQ5MS8yNzQ1MDkwMzYtMzA0MmQyYTMtYmQwMi00MGVlLWI2NDYtYmY3Y2NjODc5ZTkxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAyMjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMjI4VDA5MDUwNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE1ZGJiYWNlNmY2ZDM2MjljNTViNzllZjkwYTBiMGY0MDQ0NzZkZWE0MjY0NDY5YTI0YTVmZDkwNTQwZTAwYTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.NszJKIpgJMTPAaPksPWHGp7pRUUM31mKA0rg7vDHooc) Here, `attention_sinks` refers to the StreamingLLM approach, `transformers` is their model used via...

> > when the input length grows to 8k, it failed. Is this right? > > That's right. Although the model doesn't crash until 128k, it doesn't perform well once...

> Correct, not for `mistralai/Mistral-7B-v0.1`, at least. There are some Mistral-based models that work on longer sequence lengths, e.g.: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k Thanks tom, i'll check the url later!

> Correct, not for `mistralai/Mistral-7B-v0.1`, at least. There are some Mistral-based models that work on longer sequence lengths, e.g.: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k Hi @tomaarsen , i have another problem here. In your...