server crashed for some reason, unable to proceed

Open Archmilio opened this issue 1 year ago • 1 comments

I executed the following script but keep getting an error.

python -m mii.entrypoints.openai_api_server
--model "/logs/llama-2-70b-chat/"
--port 8000
--host 0.0.0.0
--tensor-parallel 2

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/mii/entrypoints/openai_api_server.py", line 506, in mii.serve(app_settings.model_id, File "/usr/local/lib/python3.10/dist-packages/mii/api.py", line 155, in serve import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init() File "/tmp/mii_cache/deepspeed-mii/score.py", line 33, in init mii.backend.MIIServer(mii_config) File "/usr/local/lib/python3.10/dist-packages/mii/backend/server.py", line 47, in init self._wait_until_server_is_live(processes, File "/usr/local/lib/python3.10/dist-packages/mii/backend/server.py", line 62, in _wait_until_server_is_live raise RuntimeError( RuntimeError: server crashed for some reason, unable to proceed

I don't know how to solve this issue.

Mar 18 '24 09:03 Archmilio

Hi @Archmilio could you try running the model in a pipeline? I suspect that the server is crashing when loading the model, but since it is a separate process the real error is not being shown:

import mii
pipe = mii.pipeline("/logs/llama-2-70b-chat/", tensor_parallel=2)
print(pipe("DeepSpeed is"))

Run this example with deepspeed --num_gpus 2 example.py

Mar 21 '24 23:03 mrwyattii