Firefly-LLaMA2-Chinese icon indicating copy to clipboard operation
Firefly-LLaMA2-Chinese copied to clipboard

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Open haonanye opened this issue 1 year ago • 0 comments

Error operation not supported at line 351 in file /home/tim/git/bitsandbytes/csrc/pythonInterface.c ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 31779) of binary: /root/miniconda3/envs/chatglm_ft/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/chatglm_ft/bin/torchrun", line 8, in sys.exit(main()) File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/chatglm_ft/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

CUDA:11.7 CentOS Linux release 7.7.1908 (Core)

haonanye avatar Nov 10 '23 09:11 haonanye