llama-stack-apps
llama-stack-apps copied to clipboard
worker_process_entrypoint FAILED
I have tried with my ubuntu 22.04 OS but it gives following error.
E0724 19:33:34.565000 128818126430656 torch/distributed/elastic/multiprocessing/api.py:702] failed (exitcode: -9) local_rank: 0 (pid: 30194) of fn: worker_process_entrypoint (start_method: fork) E0724 19:33:34.565000 128818126430656 torch/distributed/elastic/multiprocessing/api.py:702] Traceback (most recent call last): E0724 19:33:34.565000 128818126430656 torch/distributed/elastic/multiprocessing/api.py:702] File "/home/aleya/Work/Habibi/llama/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 659, in _poll E0724 19:33:34.565000 128818126430656 torch/distributed/elastic/multiprocessing/api.py:702] self._pc.join(-1) E0724 19:33:34.565000 128818126430656 torch/distributed/elastic/multiprocessing/api.py:702] File "/home/aleya/Work/Habibi/llama/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 170, in join E0724 19:33:34.565000 128818126430656 torch/distributed/elastic/multiprocessing/api.py:702] raise ProcessExitedException( E0724 19:33:34.565000 128818126430656 torch/distributed/elastic/multiprocessing/api.py:702] torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGKILL Process ForkProcess-1: Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/Habib/llama/venv/lib/python3.10/site-packages/llama_toolchain/inference/parallel_utils.py", line 175, in launch_dist_group elastic_launch(launch_config, entrypoint=worker_process_entrypoint)( File "/home/Habib/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/Habib/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: