zero_nlp 单机多卡训练chat

用仓库代码，虽然电脑上有两块GPU，但是还是加载一块GPU，如果指定各个层在不同GPU上，会报Tensor不在一个device上的错误。

Apr 20 '23 06:04 cxj01

注意模型代码，使用我提供的代码

Apr 20 '23 06:04 yuanzhoulvpi2017

@yuanzhoulvpi2017 我就是完全使用本仓库的代码，只是我只把最后两层放到了另一个GPU上

Apr 20 '23 06:04 cxj01

layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下

Apr 20 '23 07:04 yuanzhoulvpi2017

layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下

完全按照本仓库的代码，但是报错，同上，RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

Apr 30 '23 07:04 YSLLYW

layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下

完全按照本仓库的代码，但是报错，同上，RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

样例跑不通

Apr 30 '23 07:04 YSLLYW

layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下

完全按照本仓库的代码，但是报错，同上，RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

定位到那一层，把input.to（和他的weights一样的设备，就能跑过）

Jul 01 '23 05:07 Ardang666

zero_nlp
zero_nlp copied to clipboard

单机多卡训练chat_glm 有误

zero_nlp zero_nlp copied to clipboard

单机多卡训练chat_glm 有误

zero_nlp
zero_nlp copied to clipboard