llama
llama copied to clipboard
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 6914) of binary: /usr/bin/python3
I tried this on colab : ! torchrun --nproc_per_node 1 example_text_completion.py
! --ckpt_dir llama-2-7b-chat/
! --tokenizer_path tokenizer.model
! --max_seq_len 64 --max_batch_size 1 #(instead of 4)
and getting following error :
initializing model parallel with size 1 initializing ddp with size 1 initializing pipeline with size 1 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 6914) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/local/bin/torchrun", line 8, in
sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main run(args) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ===================================================== example_text_completion.py FAILED
Failures: <NO_OTHER_FAILURES>
Root Cause (first observed failure): [0]: time : 2023-08-02_02:47:55 host : c6666b425cdc rank : 0 (local_rank: 0) exitcode : -9 (pid: 6914) error_file: <N/A> traceback : Signal 9 (SIGKILL) received by PID 6914
Same issue in Apple M1
The same issue to me, Does anyone have a solution?
Facing in AWS EC2, did anyone find solution?
@ideepankarsharma2003, any update on this issue. Help Appreciated.
There are no details about the context. Can you run nvidia-smi
?
@EmanuelaBoros The same issue to me, can you help me.
@suannairen Without knowing anything about the system, it's difficult to say. What OS? And can you run nvidia-smi
(in case it's possible)?
Same problem for me, I don't know why.