ehuaa comments

Results 29 comments of


                                            ehuaa

version unmatched when i build from source code

i fixed this problem by pull the latest pr you committed last week, thanks!

gptq 4bit avg loss is large

@moseshu Have you figure this problem out?

calibration dataset in AWQ support customization

@lvhan028 @pppppM 还有一个问题，就是目前calib_dataloader.py中对于不同type的model，送入tokenizer的校准数据的格式都是一样的。比如 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/lite/utils/calib_dataloader.py#L28-L29 对于比如Qwen2和Llama2两类不同指令格式的llm，tokenize的数据是一样的，想问下这里可不可以送入校准数据的时候根据chat template对校准数据做封装呢，比如对Qwen2添加User 等字符，是不是会提高量化的精确率，谢谢

calibration dataset in AWQ support customization

https://github.com/QwenLM/Qwen/issues/657#issuecomment-1820628134 比如在这里Qwen官方所述（虽然是GPTQ），量化校准的数据格式最好和finetune的格式匹配，即chat template 还有calib_seqlen这里默认截断2048的话，对qwen72b来说的话是不是2k到32k最大长度之间的校准数据就没法输入了

calibration dataset

+1 I have met the same problem with the accuracy drop after quantization, how can i debug with it or how to choose the calibration dataset? @Tracin @byshiue

Comparison with SWA in Mistral

> For reference, the Mistral model degrades in performance over time just like dense attention methods: ![272347418-3a4c5634-cc1b-42d1-a35a-afb376a4f970](https://private-user-images.githubusercontent.com/37621491/274509036-3042d2a3-bd02-40ee-b646-bf7ccc879e91.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDkxMTE0MDQsIm5iZiI6MTcwOTExMTEwNCwicGF0aCI6Ii8zNzYyMTQ5MS8yNzQ1MDkwMzYtMzA0MmQyYTMtYmQwMi00MGVlLWI2NDYtYmY3Y2NjODc5ZTkxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAyMjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMjI4VDA5MDUwNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE1ZGJiYWNlNmY2ZDM2MjljNTViNzllZjkwYTBiMGY0MDQ0NzZkZWE0MjY0NDY5YTI0YTVmZDkwNTQwZTAwYTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.NszJKIpgJMTPAaPksPWHGp7pRUUM31mKA0rg7vDHooc) Here, `attention_sinks` refers to the StreamingLLM approach, `transformers` is their model used via...

[Bug] multinode fails with RuntimeError: Gloo connectFullMesh failed with [../third_party/gloo/gloo/transport/tcp/pair.cc:144] no error

@pseudotensor how do you solve this? Is this a hardware problem?