Aaron Chung comments

Results 18 comments of


                                            Aaron Chung

萌新求问，只有llama.cpp才能量化吗？

好的，谢谢 @ymcui 和 @bash99 ！😊我也通过合并等等步骤，用llama.cpp和hf都成功部署了~ 但是目前有个问题，alpaca 7b的hf推理，几乎没有上下文记忆，这个也是需要调参吗 **有关量化，我先试一下gptq在inference上的表现。另外发现num_beams只要不是1就很容易出现这个问题，不过这个倒是无关紧要... Traceback (most recent call last): File "/home/#####/Chinese-LLaMA-Alpaca/scripts/inference_hf.py", line 104, in generation_output = model.generate( ^^^^^^^^^^^^^^^ File "/home/#####/anaconda3/envs/llm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args,...

Training with Accelerator Fails. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:7! (when checking argument for argument index in method wrapper__index_select)

same issue

chaglm2 loRA finetuning

> 修改modeling_chatglm.py chat方法，我没有用流方式 ###### if not history: prompt = query else: prompt = "" for i, (old_query, response) in enumerate(history): prompt += "[Round {}]\n问：{}\n答：{}\n".format(i, old_query, response) prompt += "[Round {}]\n问：{}\n答：".format(len(history),...

chaglm2 loRA finetuning

> 发现自己微调不能生效的原因，是modeling_chatglm.py中原生的build_inputs会把你的输入组织成“[Round n] 问：***************** " 作为prompt传出tokenizer生成inputs。而微调训练时提供的提示词是特定句式比如请查询。。。。， build_inputs方法会把外面包上一层，导致chatglm不能按照希望的结果返回，所以去掉外面一层，直接把query传入tokenizer就能得到和训练时一致的返回。意思就是说，用这行就行了嘛？`inputs = self.build_inputs(tokenizer, query, history=history)`

Aaron Chung

萌新求问，只有llama.cpp才能量化吗？

Training with Accelerator Fails. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:7! (when checking argument for argument index in method wrapper__index_select)

chaglm2 loRA finetuning

chaglm2 loRA finetuning

chaglm2 loRA finetuning

fp8 support

多卡运行后报错

llama3-8B-base模型全量微调mmlu掉点

Question on the source of commonsense_15k

Question about datasets variants