DeepSpeed
DeepSpeed copied to clipboard
[REQUEST] Model serving via deepspeed's inference module
Is your feature request related to a problem? Please describe. No
Describe the solution you'd like
I am trying to run my model serving code in a model-parallel fashion. The tutorial shows how to run code on multi-GPU but the data is predefined, which cannot be used for serving. My original code is using fastapi to do the serving work. When using deepspeed --num_gpus n example.py
the fastapi server will also be initiated n times, which cause port conflict.
Describe alternatives you've considered Do I have to first start the model in parallel using deepspeed in one script and then start another script for fastapi, and finally connect them somehow?
Additional context None.
Hi @callzhang
Could you please provide a test script so that I can try on my end and can understand the problem better? Thanks, Reza
Here is the minimum code I tried:
from fastapi import FastAPI, Request, Response, Query
from transformers import pipeline
import deepspeed, torch, os, uvicorn
app = FastAPI()
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='gpt2', device=local_rank)
generator.model = deepspeed.init_inference(generator.model,
mp_size=world_size,
dtype=torch.float,
replace_method='auto')
@app.get("/gen")
def generate(text):
return generator(text, max_length=100)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(f'initiating server on rank: {local_rank}')
uvicorn.run(
"min_example_deepspeed_mp:app",
host="0.0.0.0", port=8500,
log_level="info",
workers=1
)
Then I ran deepspeed --num_gpus 2 min_example_deepspeed_mp.py
and I got the following error:
[2021-11-03 01:33:39,359] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2021-11-03 01:33:39,373] [INFO] [runner.py:360:main] cmd = /home/stardust/anaconda3/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 nlp/sentence_generation/min_example_deepspeed_mp.py
[2021-11-03 01:33:39,993] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2021-11-03 01:33:39,994] [INFO] [launch.py:86:main] nnodes=1, num_local_procs=2, node_rank=0
[2021-11-03 01:33:39,994] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2021-11-03 01:33:39,994] [INFO] [launch.py:102:main] dist_world_size=2
[2021-11-03 01:33:39,994] [INFO] [launch.py:104:main] Setting CUDA_VISIBLE_DEVICES=0,1
initiating server on rank: 1
initiating server on rank: 0
initiating server on rank: 0
Traceback (most recent call last):
File "nlp/sentence_generation/min_example_deepspeed_mp.py", line 17, in <module>
uvicorn.run(
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/main.py", line 447, in run
server.run()
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 68, in run
return asyncio.run(self.serve(sockets=sockets))
File "/home/stardust/anaconda3/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "uvloop/loop.pyx", line 1501, in uvloop.loop.Loop.run_until_complete
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 76, in serve
config.load()
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/config.py", line 448, in load
self.loaded_app = import_from_string(self.app)
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/home/stardust/anaconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/stardust/algorithms-playground/nlp/sentence_generation/min_example_deepspeed_mp.py", line 17, in <module>
uvicorn.run(
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/main.py", line 447, in run
server.run()
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 68, in run
return asyncio.run(self.serve(sockets=sockets))
File "/home/stardust/anaconda3/lib/python3.8/asyncio/runners.py", line 33, in run
raise RuntimeError(
RuntimeError: asyncio.run() cannot be called from a running event loop
initiating server on rank: 1
Traceback (most recent call last):
File "nlp/sentence_generation/min_example_deepspeed_mp.py", line 17, in <module>
uvicorn.run(
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/main.py", line 447, in run
server.run()
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 68, in run
return asyncio.run(self.serve(sockets=sockets))
File "/home/stardust/anaconda3/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "uvloop/loop.pyx", line 1501, in uvloop.loop.Loop.run_until_complete
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 76, in serve
config.load()
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/config.py", line 448, in load
self.loaded_app = import_from_string(self.app)
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/home/stardust/anaconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/stardust/algorithms-playground/nlp/sentence_generation/min_example_deepspeed_mp.py", line 17, in <module>
uvicorn.run(
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/main.py", line 447, in run
server.run()
File "/home/stardust/anaconda3/lib/python3.8/site-packages/uvicorn/server.py", line 68, in run
return asyncio.run(self.serve(sockets=sockets))
File "/home/stardust/anaconda3/lib/python3.8/asyncio/runners.py", line 33, in run
raise RuntimeError(
RuntimeError: asyncio.run() cannot be called from a running event loop
sys:1: RuntimeWarning: coroutine 'Server.serve' was never awaited
sys:1: RuntimeWarning: coroutine 'Server.serve' was never awaited
Killing subprocess 8312
Killing subprocess 8313
Traceback (most recent call last):
File "/home/stardust/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/stardust/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/stardust/anaconda3/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 171, in <module>
main()
File "/home/stardust/anaconda3/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 161, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/stardust/anaconda3/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 139, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/stardust/anaconda3/bin/python', '-u', 'nlp/sentence_generation/min_example_deepspeed_mp.py', '--local_rank=1']' returned non-zero exit status 1.
Thanks I will try this on my end
Any update on this? Is there another recommended way to do this - for instance, if we wanted to run with uvicorn and thus couldn't use the deepspeed launcher?
Can you resolved it by gunicorn?
any updates on this issue @RezaYazdaniAminabadi
Any update on this?
Any update or workaround?
Face same problem in my project.
code
from transformers import pipeline
import transformers
import deepspeed
import torch
import os
from transformers.models.t5.modeling_t5 import T5Block
import sys
import torch.distributed as dist
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '2'))
pipe = pipeline("text2text-generation", model="t5-v1_1-small", device=local_rank)
pipe.model = deepspeed.init_inference(
pipe.model,
mp_size=world_size,
dtype=torch.float
)
pipe.device = torch.device(f'cuda:{local_rank}')
if not dist.is_initialized() or dist.get_rank() == 0:
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello_world():
object_list = ["Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy"]
dist.broadcast_object_list(object_list, src=0)
output = pipe(object_list)
print(output)
return "<p>Hello, World!</p>%s"%output
app.run()
else:
while True:
object_list = [None]
dist.broadcast_object_list(object_list, src=0)
output = pipe(object_list)
output
root@000d2e0398b9:/workspace# curl http://127.0.0.1:5000
<p>Hello, World!</p>[{'generated_text': 'd review: this is the best cast iron skillet. Great review! Great review! Great'}]
I used 'broadcast_object_list' method to solve this problem.