Ziqing Yang comments

Results 212 comments of


                                            Ziqing Yang

run_clm进行预训练的疑问

> > 因为是在run_clm.py基础上预训练的，所以我们没有对它的数据格式改动太多。我们也考虑了预训练没有``的影响，不过最终认为影响不大，原因如下（当然，我并不知道LLaMA原版在预训练时有没有用``）： > > > > * 对于预训练模型，如果直接拿来做生成，可以把eos设置成bos，因为按预训练数据的组织形式，实际上``也充当了上一句eos的功能 > > * 另一方面，在SFT训练中，``是加了回来的；我们实验也证实了在SFT阶段模型也是能学到停止符的含义的 > > 主要是我试了原生的llama，是可以停止的。bos和eos是正常的，但是你们给出的中文增量后就不会停止了，所以有这个疑问，是不是因为这个原因导致的。是测试Chinese-LLaMA吗，如果停止符设成``，那的确不会停止，是这个原因。

ValueError: Attempting to unscale FP16 gradients.

peft前段时间的版本中modules_to_save功能不太稳定，所以最好就stick to https://github.com/huggingface/peft/tree/13e53fc 这个版本。

LLaMA is now in HuggingFace's main branch.\nPlease reinstall it: pip uninstall transformers && pip install git+https://github.com/huggingface/transformers.git

> merge_llama_with_chinese_lora_to_hf里面 assert ( > "LlamaTokenizer" in transformers._import_structure["models.llama"] > ), "LLaMA is now in HuggingFace's main branch.\nPlease reinstall it: pip uninstall transformers && pip install git+[https://github.com/huggingface/transformers.git"可以去掉吗](https://github.com/huggingface/transformers.git%22%E5%8F%AF%E4%BB%A5%E5%8E%BB%E6%8E%89%E5%90%97) 我安装最新的transformers也会报这个错但是去掉就能转换了我安装了最新的Transformers，但这个assert并不会报错啊

合并为hf版本，能否直接作为alpaca_lora的base_models

合并了之后就不再需要lora权重了，不用再提供lora_weights了吧？

合并为hf版本，能否直接作为alpaca_lora的base_models

> 想了一个使用GPU的思路，合并模型权重的时候选择合并为hf版本，直接作为原生alpaca_lora的base_models，配合hf上面的lora_weights 调用原生的Alpaca-LoRA进行推理。个人测试了一下，载入模型是成功的，但是载入以后就报错了，有没有相同思路的朋友尝试过，不知道是不是转模型的时候转错了。 > > I came up with an idea to use GPU, merging the model weights by selecting the HF script during the merge, and...

size mismatch 错误

检查一下模型权重，看起来13B和7B的权重/配置被混淆了

全量模型size mismatch 49954 vs. 32000? RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:

config.json中的vocab_size是49954吗？不是的话修改后再试试

全量模型size mismatch 49954 vs. 32000? RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:

从报错信息看模型的checkpoint的大小是正确的（49954），但初始化的模型embedding是32000，所以加载不进去，问题应该只在config.json上。能显示一下修改vocab_size后的Chinese-LLaMA-Alpaca/merged_models/chinese_alpaca_merged_7b-hf/config.json的内容吗

全量模型size mismatch 49954 vs. 32000? RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:

> 为什么接下来这里又出错了呢？ > > File [~/anaconda3/envs/env_Chinese_LLaMA/lib/python3.10/site-packages/transformers/generation/utils.py:1524](https://file+.vscode-resource.vscode-cdn.net/Chinese-LLaMA-Alpaca/merged_models/chinese_alpaca_merged_7b-hf/~/anaconda3/envs/env_Chinese_LLaMA/lib/python3.10/site-packages/transformers/generation/utils.py:1524), in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, streamer, **kwargs) ... 2208 # remove once script supports set_grad_enabled 2209 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type) ->...

多卡训练时，第0张卡被占满，导致CUDA out of memory

多卡训练用的是deepspeed ZeRO-2 策略吗，建议使用deepspeed做多卡训练，我们也提供了deepspeed config文件，并对教程所有更新，可以参见[预训练脚本](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/预训练脚本)和[指令精调脚本](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/指令精调脚本)