jiaqiw09
jiaqiw09
遇到了相同的问题,zero3 + sft 会在训练输出loss的第一步就卡住
> 我怀疑是通讯问题,有的伙伴升级了nccl后,deepspeed zero3+lora跑到中途会出现Invalidate trace cache @ step 738: expected module 752, but got module 784,然后继续hang。 我这边没有nvlink,哪个伙伴试一下:export NCCL_P2P_LEVEL=NVL,如果设置完环境能跑通deepspeed zero3+lora记得拍一下我。 我这边试过,没有效果,还是第一步就卡住了,nvcc版本cuda_12.1.r12.1/compiler.32415258_0
> > Mixtral 可能不支持 ZeRO3 > > 这个模型训练难道也没有用zero3?咋训的啊 @hiyouga,请问是通过zero2 + 量化来完成微调的吗
> A temporary workaround is, if the error is > > > RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and...
@haileyschoelkopf @lintangsutawika would you mind having a check? Best
@haileyschoelkopf @lintangsutawika thanks for your suggestion and thanks @statelesshz support in fixing code. I have just test code in NPU and GPU, three methods all work. Would you mind having...
I have already make a PR(https://github.com/EleutherAI/lm-evaluation-harness/pull/1787), it's better to show how to adapt a NPU to huggingface.py
@haileyschoelkopf Hi, would you mind having a check?
@gspeter-max thx for your pr. I just check it in ascend npus with torch2.1.0 and torch_npu 2.1.0, it works. Here is my test code: ``` AutoModelForCausalLM.from_pretrained("qwen2_7b",device_map="auto") ``` cc @SunMarc
> Thanks a lot for verifying @jiaqiw09! Great to hear it works well on Ascend NPUs with torch 2.1.0 and torch_npu 2.1.0. Let me know if there’s anything else needed...