YSLLYW

Results 21 comments of YSLLYW

> @YSLLYW Yes. I figured it out. Should add torchrun. 是的,使用torchrun --nproc_per_node=2 --master_port=1234 finetune.py

> Hi @imrankh46 Thanks for the issue! We are aware of the issue, for now the solution is to pass `device_map={"":0}` when calling `PeftModel.from_pretrained`, we will work on a proper...

> @YSLLYW Can you try by installing `accelerate` from source? > > ```shell > pip install git+https://github.com/huggingface/accelerate > ``` Yes, I just updated the PEFT version to 0.3.0 and resolved...

> > @YSLLYW Can you try by installing `accelerate` from source? > > ```shell > > pip install git+https://github.com/huggingface/accelerate > > ``` > > I already solved the issues. Thanks...

> 同问,我有8张V100的卡,用nvlink互联的,目前训练只能用一张卡,大于两张卡就报错,什么原因?是否需要开启nvlink功能?若是,如何开启? 补充:前面基于两张pcie的v100卡训练,两张可以运行,但是训练性能因为有两卡之间的通讯开销反而不如一张卡。 谢谢! 代码的问题,可以看看lora版本的,那个项目有多卡的,也同样有p-tuning多卡版本的

> > > > ChatGLM-6B 的 huggingface repo 更新了,需要重新下载模型下来,然后再运行(官方的一些特殊 token 的 ID 又变了) > > > > > > > > > [https://huggingface.co/THUDM/chatglm-6b这个地址?](https://huggingface.co/THUDM/chatglm-6b%E8%BF%99%E4%B8%AA%E5%9C%B0%E5%9D%80%EF%BC%9F) > > > > > > 是的,然后里面模型和tokenizer...

> Found cached dataset json (/home/ub2004/.cache/huggingface/datasets/json/default-6eef2a44d8479e8f/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00 > # 回溯(最近一次调用): > 文件 “/home/ub2004/.local/bin/torchrun”,第 8 行,在 sys.exit(main()) > 文件中 文件 “/home/ub2004/.local/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/**init**.py”,第 346 行,在包装器 > 中返回 f(*args, **kwargs) 文件 “/home/ub2004/.local/lib/python3.8/site-packages/torch/distributed/run.py”,第...

> I found the following information online, but I don't understand the rest. Can anyone help me? tokenizer = AutoTokenizer.from_pretrained("base-model", trust_remote_code=True) model = AutoModel.from_pretrained("bsae-model", trust_remote_code=True).half().cuda() peft_config = LoraConfig( task_type=TaskType.CAUSAL_LM, inference_mode=False,...

> 1. 用lora做增量预训练,llama-7b用fp16加载需要14G显存,block size=1024上下文长度,需要24G显存就可以,batch size 适当调整; > 2. 用全参训练,llama-7b需要4张32G显卡。 这里您用的全参数训练,请问用了deepspeed吗,没用的话大概需要多少显存

> layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下 完全按照本仓库的代码,但是报错,同上,RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)