Ben Rood
Ben Rood
> > pip install git+https://github.com/shibing624/lmft.git 安装开发中的版本,新功能还没release 到pip > > 用这种方法安装推理报错:  调用脚本:  我在服务器测试时,也出现了这个错误,是有不兼容吗? @shibing624 chatglm-6b在4月6号似乎稍微更新了他们的模型, ``` 移除embedding中的image token以减小显存占用(需要更新模型文件pytorch_model-00001-of-00008.bin和pytorch_model-00008-of-00008.bin,感谢 [@silverriver](https://github.com/silverriver) 提出的想法)。去掉了对 icetk 的依赖(需要更新模型文件ice_text.model)。 ``` 我本地机器有4月4号下载的老模型,测试可以推理,但是因为显存存不足,加载时设置了args量化到8和开fp16。推理效果有点奇特,输出如下 ``` ['少先队员应该为老人让座。\n正确的写法应该是:少先队员应该为老人让座。其中,“因该”是错误的拼写,正确的写法应该是“应该”。同时,“老人”的拼写也不正确,正确的写法应该是“老年人”。“少先队员”的拼写正确。'] ``` 完整log ```...
> > chatglm-6b 用最新的权重,我本地测试时 "quantization_bit": None, 没开量化,训练时也没用int8,我的环境是V100不支持int8,看你的结果应该lora还是没有起到作用,lora生效后,输出格式是严格按照 '少先队员应该为老人让座。\n错误字: 输出的。 > > 但这个输出结果和chatglm-6b本身的输出结果也不一样,虽然可能存在抖动问题但是多次调用原模型没有输出这种格式。 感觉lora应该还是生效了 对,特意去掉lora但是还是用lmft的方式加载了试试,输出是 ``` ['少先队员应该为老人让座。\n\n正确的拼音是:"sòu bèi shǒu gèng jiā",其中,“少先队员”的拼音是“sòu bèi shǒu gèng”,“老人”的拼音是“gèng jiā”。'] ``` 后面加载lora时仍然是同上的”RuntimeError: Expected 4-dimensional input for...
> Ah I see what the issue is. > > We're using a custom GGUF model parser in aphrodite, so it means everything needs to be hand-written and implemented for...
> @bash99 we have a PR at to fix this, and support arbitrary GGUF models. I'm trying to build the dev branch, but got this error even I use update-runtime.sh...
> We unfortunately had the install condition for punica and hadamard kernels using the wrong facing comparison sign. Fixed in the latest commit to dev. I've try it, and it...
> No it seems to be a bug somewhere else. In the mean time you can use GPTQ/AWQ/exl2 quants of the same model. It seems some bug related to Qwen...
> No it seems to be a bug somewhere else. In the mean time you can use GPTQ/AWQ/exl2 quants of the same model. I've try GPTQ model made by LoneStriker...
> I seemed to find what is wrong. Currently quants without merged weights (exl2, gguf) and models with linear bias (Qwen) is broken for tp>1. We are working on a...
> yeah the Q5_k_m gguf is 5.7bpw while the gptq 4bit g32 is 4.625bpw but gptq 4bit has more impact on instruction follow, I'd one prompt works fine on 32b-Q_K_5_M,...
> > yeah the Q5_k_m gguf is 5.7bpw while the gptq 4bit g32 is 4.625bpw > > but gptq 4bit has more impact on instruction follow, I'd one prompt works...