DeepSpeed-MII
DeepSpeed-MII copied to clipboard
waiting for server to start...
Hello, I start deploy in one node with 4GPU, and set tensor_parallel 2. program is always wating for server to start
code is:
hostfile is: 127.0.0.1 slots=2
@yunll are you able to see any GPU memory usage (via nvidia-smi
)? I am wondering if there is a problem loading the model. Either way, I think we could improve the feedback to user to be more descriptive with what the server is doing in the background.
Also, could you try without the grpc server? set deployment_type=mii.DeploymentType.NON_PERSISTENT
in your call to mii.deploy()
and launch with deepspeed --num_gpus 2 your_script.py
I'm facing the same issue with both persistent and non-persistent deployments. It's not loading the model on the GPUs. I've tried deepspeed and zero2 and zero3.
model_id = "codellama/CodeLlama-7b-Instruct-hf"
model_path = ".cache/huggingface/hub/"
mii_configs = {"tensor_parallel": 5,
"dtype": "fp16",
"trust_remote_code": True}
mii.deploy(task="text-generation",
model=model_id,
deployment_name="mii",
model_path=model_path + model_id,
mii_config=mii_configs,
enable_deepspeed=True,
enable_zero=False,
deployment_type=mii.constants.DeploymentType.NON_PERSISTENT
)
[2023-09-04 12:23:19,159] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['.local/lib/python3.10/site-packages/torch']
torch version .................... 2.0.1+cu117
deepspeed install path ........... ['.local/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.10.2, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7
shared memory (/dev/shm) size .... 28.73 GB
I'm facing the same issue with both persistent and non-persistent deployments. It's not loading the model on the GPUs. I've tried deepspeed and zero2 and zero3.
@infosechoudini what behavior are you seeing when you load the model with non-persistent deployment type or just using DeepSpeed? Does a simple script like the following run for you?
import torch
import deepspeed
import os
from transformers import pipeline
local_rank = int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
task_name = "text-generation"
model_name = "gpt2"
input_strs = ["DeepSpeed is", "Microsoft is"]
def run():
pipe = pipeline(task_name, model_name, torch_dtype=torch.float16, device=local_rank)
pipe.model = deepspeed.init_inference(
pipe.model,
replace_with_kernel_inject=True,
mp_size=world_size,
dtype=torch.float16,
)
output = pipe(input_strs)
print(output)
if __name__ == "__main__":
run()
Run with deepspeed script.py
Hey,
Deepspeed works fine. I just finished training a model with deepspeed yesterday. I was messing around with it but couldnt find a solution.
It just hangs on waiting for server to start then crashes after it times out.
@infosechoudini I want to determine if there is a bug in MII or a problem in your environment that is causing this hang. I see that you are setting "tensor_parallel": 5
. I have seen issues in the past with model sharding when using an odd number of GPUs. Could you try running with 4 GPUs?
Hi @mrwyattii , may I ask how to keep the restful server alive ? Here is my script
import mii
mii_configs = {
"tensor_parallel": 2,
"dtype": "fp16",
"enable_restful_api": True,
"restful_api_port": 35215,
"skip_model_check": True
}
mii.deploy(task="text-generation",
model="/path/to/my/model",
deployment_name=MY_DEPLOYMENT",
mii_config=mii_configs,
deployment_type=mii.DeploymentType.NON_PERSISTENT
)
It seems that after I ran deepspeed --num_gpus 2 api.py
, the process just exited. The model was loaded on GPUs, but the server did not stay alive. Can you help me out ?
Hi @mrwyattii what could be the potential reasons for server.py to keep waiting for server to start? when I ran the server with the test.py you gave, it seemed to work, so I guess model is not the problem that makes this problem, but when I run server.py, it waits until the server is live. nvidia-smi shows 448MB is used for each GPU while I try to load the 7b model, so i guess the model is not properly loaded. but why is it different between persistent and non-persistent deployment?