code-llama-for-vscode icon indicating copy to clipboard operation
code-llama-for-vscode copied to clipboard

When I execute “torchrun --nproc_per_node 1 llamacpp_mock_api.py”, the following error occurs.

Open HwJhx opened this issue 2 years ago • 2 comments

torchrun --nproc_per_node 1 llamacpp_mock_api.py
--ckpt_dir CodeLlama-7b-Instruct/
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model
--max_seq_len 128 --max_batch_size 4

initializing model parallel with size 1 initializing ddp with size 1 initializing pipeline with size 1 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 16713) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/local/bin/torchrun", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main run(args) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ====================================================== llamacpp_mock_api.py FAILED


Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2023-09-04_12:12:41 host : 13edd873e909 rank : 0 (local_rank: 0) exitcode : -9 (pid: 16713) error_file: <N/A> traceback : Signal 9 (SIGKILL) received by PID 16713

HwJhx avatar Sep 04 '23 12:09 HwJhx

My GPU Info as below:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 32C P8 9W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

HwJhx avatar Sep 04 '23 12:09 HwJhx

Did you figure it out? I have the same problem

BoazimMatrix avatar Oct 01 '23 08:10 BoazimMatrix

Were you able to run Code Llama successfully using the codellama repository?

It's been nearly a year since this was opened, so I'm going to close it for now, but I'll reopen it if you send another message.

xNul avatar Aug 01 '24 00:08 xNul