YSLLYW comments

Results 21 comments of


                                            YSLLYW

LLama RuntimeError: CUDA error: device-side assert triggered

> @YSLLYW Yes. I figured it out. Should add torchrun. 是的，使用torchrun --nproc_per_node=2 --master_port=1234 finetune.py

AttributeError: 'NoneType' object has no attribute 'device'

> Hi @imrankh46 Thanks for the issue! We are aware of the issue, for now the solution is to pass `device_map={"":0}` when calling `PeftModel.from_pretrained`, we will work on a proper...

AttributeError: 'NoneType' object has no attribute 'device'

> @YSLLYW Can you try by installing `accelerate` from source? > > ```shell > pip install git+https://github.com/huggingface/accelerate > ``` Yes, I just updated the PEFT version to 0.3.0 and resolved...

AttributeError: 'NoneType' object has no attribute 'device'

> > @YSLLYW Can you try by installing `accelerate` from source? > > ```shell > > pip install git+https://github.com/huggingface/accelerate > > ``` > > I already solved the issues. Thanks...

用ptv2微调，多卡跑，为啥第一张卡显存占满了，报oom

> 同问，我有8张V100的卡，用nvlink互联的，目前训练只能用一张卡，大于两张卡就报错，什么原因？是否需要开启nvlink功能？若是，如何开启？补充:前面基于两张pcie的v100卡训练,两张可以运行，但是训练性能因为有两卡之间的通讯开销反而不如一张卡。谢谢！代码的问题，可以看看lora版本的，那个项目有多卡的，也同样有p-tuning多卡版本的

ValueError: 130004 is not in list

> > > > ChatGLM-6B 的 huggingface repo 更新了，需要重新下载模型下来，然后再运行（官方的一些特殊 token 的 ID 又变了） > > > > > > > > > [https://huggingface.co/THUDM/chatglm-6b这个地址？](https://huggingface.co/THUDM/chatglm-6b%E8%BF%99%E4%B8%AA%E5%9C%B0%E5%9D%80%EF%BC%9F) > > > > > > 是的，然后里面模型和tokenizer...

bash finetune_continue.sh failed with 'RuntimeError: Trainer requires either a model or model_init argument'

> Found cached dataset json (/home/ub2004/.cache/huggingface/datasets/json/default-6eef2a44d8479e8f/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00 > # 回溯（最近一次调用）： > 文件 “/home/ub2004/.local/bin/torchrun”，第 8 行，在 sys.exit（main（）） > 文件中文件 “/home/ub2004/.local/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/**init**.py”，第 346 行，在包装器 > 中返回 f（*args， **kwargs）文件 “/home/ub2004/.local/lib/python3.8/site-packages/torch/distributed/run.py”，第...

Method of using the model generated after training

> I found the following information online, but I don't understand the rest. Can anyone help me? tokenizer = AutoTokenizer.from_pretrained("base-model", trust_remote_code=True) model = AutoModel.from_pretrained("bsae-model", trust_remote_code=True).half().cuda() peft_config = LoraConfig( task_type=TaskType.CAUSAL_LM, inference_mode=False,...

请问增量预训练大概需要几块GPU呢？

> 1. 用lora做增量预训练，llama-7b用fp16加载需要14G显存，block size=1024上下文长度，需要24G显存就可以，batch size 适当调整； > 2. 用全参训练，llama-7b需要4张32G显卡。这里您用的全参数训练，请问用了deepspeed吗，没用的话大概需要多少显存

单机多卡训练chat_glm 有误

> layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下完全按照本仓库的代码，但是报错，同上，RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)