DeepSpeed-MII
DeepSpeed-MII copied to clipboard
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29700 (errno: 98 - Address already in use).
When I try to use deepspeed --num_gpus 2 xxx.py
to start the server, the error will occur.
But if I use python3 xxx.py
to start the server, it works well.
I want to deploy llama-70b
(maybe 140G) on 2 A100(80G per A100) so I have to use deepspeed
to start the server.
Here is the INFO:
[2024-01-20 10:15:26,416] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:26,676] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:26,846] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-01-20 10:15:26,846] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-01-20 10:15:26,846] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-01-20 10:15:26,846] [INFO] [launch.py:163:main] dist_world_size=2
[2024-01-20 10:15:26,846] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-01-20 10:15:26,967] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:26,967] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,150] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,150] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,259] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-01-20 10:15:27,260] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-01-20 10:15:27,260] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-01-20 10:15:27,260] [INFO] [launch.py:163:main] dist_world_size=2
[2024-01-20 10:15:27,260] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-01-20 10:15:28,970] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,041] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,083] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,117] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,509] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,576] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,576] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-20 10:15:29,804] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,805] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[W socket.cpp:436] [c10d] The server socket has failed to bind to [::]:29700 (errno: 98 - Address already in use).
[W socket.cpp:436] [c10d] The server socket has failed to bind to 0.0.0.0:29700 (errno: 98 - Address already in use).
[E socket.cpp:472] [c10d] The server socket has failed to listen on any local network address.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/launch/multi_gpu_server.py", line 105, in <module>
main()
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/launch/multi_gpu_server.py", line 98, in main
inference_pipeline = async_pipeline(args.model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/api.py", line 167, in async_pipeline
inference_engine = load_model(model_config)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/modeling/models.py", line 14, in load_model
init_distributed(model_config)
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/utils.py", line 187, in init_distributed
deepspeed.init_distributed(dist_backend="nccl", timeout=timedelta(seconds=1e9))
File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/comm.py", line 670, in init_distributed
cdb = TorchBackend(dist_backend, timeout, init_method, rank, world_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/torch.py", line 120, in __init__
self.init_process_group(backend, timeout, init_method, rank, world_size)
File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/torch.py", line 146, in init_process_group
torch.distributed.init_process_group(backend,
File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1141, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/rendezvous.py", line 241, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/rendezvous.py", line 172, in _create_c10d_store
return TCPStore(
^^^^^^^^^
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29700 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29700 (errno: 98 - Address already in use).
[2024-01-20 10:15:29,822] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,878] [INFO] [engine_v2.py:82:__init__] Building model...
[2024-01-20 10:15:29,944] [INFO] [engine_v2.py:82:__init__] Building model...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
[2024-01-20 10:15:30,593] [INFO] [engine_v2.py:82:__init__] Building model...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
[2024-01-20 10:15:30,848] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648350
[2024-01-20 10:15:30,848] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648351
[2024-01-20 10:15:31,004] [ERROR] [launch.py:321:sigkill_handler] ['/home/infer/miniconda3/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'llama-deployment', '--load-balancer-port', '50050', '--restful-gateway-port', '28080', '--restful-gateway-host', 'localhost', '--restful-gateway-procs', '32', '--server-port', '50051', '--zmq-port', '25555', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL21udC9MbGFtYS0yLTdiLWNoYXQtaGYiLCAidG9rZW5pemVyIjogIi9tbnQvTGxhbWEtMi03Yi1jaGF0LWhmIiwgInRhc2siOiAidGV4dC1nZW5lcmF0aW9uIiwgInRlbnNvcl9wYXJhbGxlbCI6IDIsICJpbmZlcmVuY2VfZW5naW5lX2NvbmZpZyI6IHsidGVuc29yX3BhcmFsbGVsIjogeyJ0cF9zaXplIjogMn0sICJzdGF0ZV9tYW5hZ2VyIjogeyJtYXhfdHJhY2tlZF9zZXF1ZW5jZXMiOiAyMDQ4LCAibWF4X3JhZ2dlZF9iYXRjaF9zaXplIjogNzY4LCAibWF4X3JhZ2dlZF9zZXF1ZW5jZV9jb3VudCI6IDUxMiwgIm1heF9jb250ZXh0IjogODE5MiwgIm1lbW9yeV9jb25maWciOiB7Im1vZGUiOiAicmVzZXJ2ZSIsICJzaXplIjogMTAwMDAwMDAwMH0sICJvZmZsb2FkIjogZmFsc2V9fSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NzAwLCAiem1xX3BvcnRfbnVtYmVyIjogMjU1NTUsICJyZXBsaWNhX251bSI6IDEsICJyZXBsaWNhX2NvbmZpZ3MiOiBbeyJob3N0bmFtZSI6ICJsb2NhbGhvc3QiLCAidGVuc29yX3BhcmFsbGVsX3BvcnRzIjogWzUwMDUxLCA1MDA1Ml0sICJ0b3JjaF9kaXN0X3BvcnQiOiAyOTcwMCwgImdwdV9pbmRpY2VzIjogWzAsIDFdLCAiem1xX3BvcnQiOiAyNTU1NX1dLCAiZGV2aWNlX21hcCI6ICJhdXRvIiwgIm1heF9sZW5ndGgiOiBudWxsLCAiYWxsX3Jhbmtfb3V0cHV0IjogZmFsc2UsICJzeW5jX2RlYnVnIjogZmFsc2UsICJwcm9maWxlX21vZGVsX3RpbWUiOiBmYWxzZX0='] exits with return code = 1
[2024-01-20 10:15:31,968] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:31,968] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:32,151] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:32,151] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
Traceback (most recent call last):
File "/home/infer/deepspeed-fastgen/quest.py", line 26, in <module>
client = mii.serve("/mnt/Llama-2-7b-chat-hf", deployment_name="llama-deployment", replica_num=1, #replica_num=2 tensor_parallel=2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/api.py", line 124, in serve
import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init()
File "/tmp/mii_cache/llama-deployment/score.py", line 33, in init
mii.backend.MIIServer(mii_config)
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/backend/server.py", line 47, in __init__
self._wait_until_server_is_live(processes,
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/backend/server.py", line 62, in _wait_until_server_is_live
raise RuntimeError(
RuntimeError: server crashed for some reason, unable to proceed
[2024-01-20 10:15:33,306] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1647573
[2024-01-20 10:15:33,306] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1647574
[2024-01-20 10:15:33,342] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648352
[2024-01-20 10:15:33,404] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648353
[2024-01-20 10:15:33,463] [INFO] [launch.py:324:sigkill_handler] Main process received SIGTERM, exiting
[2024-01-20 10:15:33,917] [ERROR] [launch.py:321:sigkill_handler] ['/home/infer/miniconda3/bin/python', '-u', 'quest.py', '--local_rank=1'] exits with return code = 1
At first, I thought it was just a process occupying this port, so I change it to 29700. But as you see, the problem has not been solved. How can I do that? The code is just like the example(but use llama-7b):
import mii
client = mii.serve("/mnt/Llama-2-7b-chat-hf", deployment_name="llama-deployment", tensor_parallel=2)
Hi @Chenhzjs if you use mii.serve
to start your server, you do not need to use the deepspeed
launcher to take advantage of tensor parallelism. mii.serve
will call the DeepSpeed launcher, so when you run your script with deepspeed --num_gpus 2
you are attempting to launch 2 inference servers (and thus you see the address already in use error).
this section of code has the same issue:
from mii import pipeline pipe = pipeline("mistralai/Mistral-7B-Instruct-v0.1") output = pipe(["Hello, my name is", "DeepSpeed is"], max_new_tokens=128) print(output)
error info:
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use)
it use pipeline only and there's no additional calling of mii.serve