zero_nlp
zero_nlp copied to clipboard
单机多卡训练chat_glm 有误
用仓库代码,虽然电脑上有两块GPU,但是还是加载一块GPU,如果指定各个层在不同GPU上,会报Tensor不在一个device上的错误。
注意模型代码,使用我提供的代码
@yuanzhoulvpi2017
我就是完全使用本仓库的代码,只是我只把最后两层放到了另一个GPU上
layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下
layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下
完全按照本仓库的代码,但是报错,同上,RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下
完全按照本仓库的代码,但是报错,同上,RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
样例跑不通
layers.27和final_layernorm和lm_head必须在同一个卡上。你改一下
完全按照本仓库的代码,但是报错,同上,RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)
定位到那一层,把input.to(和他的weights一样的设备,就能跑过)