Doc2EDAG icon indicating copy to clipboard operation
Doc2EDAG copied to clipboard

error

Open Wangpeiyi9979 opened this issue 4 years ago • 3 comments

您好,谢谢您的工作,当我有id为0,1,2,3的4个GPU时,我运行一下代码

CUDA_VISIBLE_DEVICES=0,1,2,3 ./train_multi.sh 4 --task_name [TASK_NAME] --gradient_accumulation_steps 16

却发生一下错误 subprocess.CalledProcessError: Command '['/home/wpy/miniconda3/envs/breakBert/bin/python', '-u', 'run_dee_task.py', '--local_rank=3', '--task_name', '[TASK_NAME]', '--gradient_accumulation_steps', '16']' returned non-zero exit status 1.

请问这是为什么呢

Wangpeiyi9979 avatar Oct 02 '20 23:10 Wangpeiyi9979

因为用了DDP,需要更详细的错误信息才能定位问题。请先确认下库和版本是否和Readme对应。

Spico197 avatar Dec 08 '20 09:12 Spico197

您好,谢谢您的工作,当我有id为0,1,2,3的4个GPU时,我运行一下代码

CUDA_VISIBLE_DEVICES=0,1,2,3 ./train_multi.sh 4 --task_name [TASK_NAME] --gradient_accumulation_steps 16

却发生一下错误 subprocess.CalledProcessError: Command '['/home/wpy/miniconda3/envs/breakBert/bin/python', '-u', 'run_dee_task.py', '--local_rank=3', '--task_name', '[TASK_NAME]', '--gradient_accumulation_steps', '16']' returned non-zero exit status 1.

请问这是为什么呢

请问问题解决了吗,我也遇到了类似的错误

KyrieIrving24 avatar Jul 17 '21 09:07 KyrieIrving24

您好,谢谢您的工作,当我有id为0,1,2,3的4个GPU时,我运行一下代码

CUDA_VISIBLE_DEVICES=0,1,2,3 ./train_multi.sh 4 --task_name [TASK_NAME] --gradient_accumulation_steps 16

却发生一下错误 subprocess.CalledProcessError: Command '['/home/wpy/miniconda3/envs/breakBert/bin/python', '-u', 'run_dee_task.py', '--local_rank=3', '--task_name', '[TASK_NAME]', '--gradient_accumulation_steps', '16']' returned non-zero exit status 1.

请问这是为什么呢

请问问题解决了吗

liufan777 avatar Oct 10 '21 12:10 liufan777