pppppM comments

Results 84 comments of


                                            pppppM

Error when trying to load quantized llava-v1.6-34b

模型量化崩了，量化校准导致参数出 nan 值了，可能要调整一下校准策略

微调后的模型回答带有很多重复[Bug]

@xxg98 可以先用 xtuner chat 验证一下训练后的模型是否正常对话 `xtuner chat /root/autodl-tmp/projects/LLM/fine_tuning/7b/internlm2-chat-7b-merge --prompt-template internlm2_chat` 同时，可以看下训练日志中，EvalHook 的输出是否正常

[Feature] Load dataset from ModelScope during quantization

@yinfan98 It's greatly needed! There are two ways to implement this feature, the first one is the method that @fanqiNO1 mentioned; the second one is to simplify and migrate the...

[Bug] 使用internlm2-chat-7b 微调后的自制模型，4bit量化后无法使用

@zhanghui-china lmdepoy 量化得到的模型，只能用 lmdeploy chat；xtuner chat 输入的模型，只能是原始的 fp16 模型

[Bug] 使用internlm2-chat-7b 微调后的自制模型，4bit量化后无法使用

请先确认下，没有量化前的模型 lmdeploy chat 效果，如果不正常，可能是没有匹配上正确的对话模板，需要手动指定 --model-name，具体有哪些 model name，可以通过 lmdeploy list 查看

[Question] When will lmdploy support code llama quantization?

@gesanqiu LMDeploy is functionally capable of the quantization of CodeLlama, but in practical use we found that performance significantly declines after quantization. We are also investigating the specific reasons for...

[Feature] Any Plan to implement INT8 weight-only quantizaton

We have tested the performance of INT4 weight-only quantization on OpenCompass. According to the results, the performance of INT4 quantization is on par with FP16. If you can provide us...

[Feature] Any Plan to implement INT8 weight-only quantizaton

[AWQ](https://arxiv.org/abs/2306.00978) is an INT4 Weight Only quantization algorithm. The implementation of AWQ in LMDeploy includes numerous engineering optimizations, ensuring superior speed and accuracy compared to the [official version.](https://github.com/mit-han-lab/llm-awq) We have...

llava-surgery.py for phi3_mini to get gguf

Converting phi3 to gguf can be tricky. It requires firstly transforming the llm part into llama format. The gguf on the HF hub is our temporary hard code. We are...