pppppM
pppppM
模型量化崩了,量化校准导致参数出 nan 值了,可能要调整一下校准策略
@xxg98 可以先用 xtuner chat 验证一下训练后的模型是否正常对话 `xtuner chat /root/autodl-tmp/projects/LLM/fine_tuning/7b/internlm2-chat-7b-merge --prompt-template internlm2_chat` 同时,可以看下训练日志中,EvalHook 的输出是否正常
@yinfan98 It's greatly needed! There are two ways to implement this feature, the first one is the method that @fanqiNO1 mentioned; the second one is to simplify and migrate the...
@lvhan028 正在测试 g64 的量化
@zhanghui-china lmdepoy 量化得到的模型,只能用 lmdeploy chat;xtuner chat 输入的模型,只能是原始的 fp16 模型
请先确认下,没有量化前的模型 lmdeploy chat 效果,如果不正常,可能是没有匹配上正确的对话模板,需要手动指定 --model-name,具体有哪些 model name,可以通过 lmdeploy list 查看
@gesanqiu LMDeploy is functionally capable of the quantization of CodeLlama, but in practical use we found that performance significantly declines after quantization. We are also investigating the specific reasons for...
We have tested the performance of INT4 weight-only quantization on OpenCompass. According to the results, the performance of INT4 quantization is on par with FP16. If you can provide us...
[AWQ](https://arxiv.org/abs/2306.00978) is an INT4 Weight Only quantization algorithm. The implementation of AWQ in LMDeploy includes numerous engineering optimizations, ensuring superior speed and accuracy compared to the [official version.](https://github.com/mit-han-lab/llm-awq) We have...
Converting phi3 to gguf can be tricky. It requires firstly transforming the llm part into llama format. The gguf on the HF hub is our temporary hard code. We are...