DeepSpeed
DeepSpeed copied to clipboard
[Question]how to run the mixtral inference in multi-node?
Describe the bug The program is killed by timeout of watchdog when I run deepspeed on mutli-node.
To Reproduce Steps to reproduce the behavior: my code
- Simple inference script to reproduce
deepspeed \
--hostfile=./hostfile \
--include="node0:2,3@node1:0,1" \
mixtralDs.py \
--deepspeed_config ./ds_config.json
- mixtralDs.py
def run_mixtral_ds():
local_rank = int(os.environ["LOCAL_RANK"])
torch.cuda.set_device(local_rank)
device = torch.device("cuda",local_rank)
configuration = MixtralConfig(vocab_size=32000,
hidden_size=4096//2,
intermediate_size=14336//2,
num_hidden_layers=32//2,
num_attention_heads=32//2,
num_key_value_heads=8//2,
hidden_act="silu",
max_position_embeddings=(4096) * (32),
initializer_range=0.02,
rms_norm_eps=1e-5,
use_cache=True,
pad_token_id=None,
bos_token_id=1,
eos_token_id=2,
tie_word_embeddings=False,
rope_theta=1e6,
sliding_window=None,
attention_dropout=0.0,
num_experts_per_tok=2,
num_local_experts=8,
output_router_logits=False,
router_aux_loss_coef=0.001)
mixtralmodel = MixtralModel(config = configuration).to(device)
inputs_ids = torch.randint(
low=0,high=configuration.vocab_size,size=(4,30)
).to(device)
ds_model = deepspeed.init_inference(mixtralmodel,mp_size=4, dtype=torch.float16)
res = ds_model(inputs_ids)
print(res)
if __name__ == '__main__':
run_mixtral_ds()
Expected behavior The program run in multi-node and print a result.
ds_report output
Please run ds_report
to give us details about your setup.
DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import('pkg_resources').require('deepspeed==0.14.3+0fc19b6a') [2024-05-16 21:46:15,312] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-05-16 21:46:15,444] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja .................. [OKAY]
op name ................ installed .. compatible
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] cpu_lion ............... [NO] ....... [OKAY] [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH evoformer_attn ......... [NO] ....... [NO] [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures fp_quantizer ........... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] fused_lion ............. [NO] ....... [OKAY] inference_core_ops ..... [NO] ....... [OKAY] cutlass_ops ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] ragged_device_ops ...... [NO] ....... [OKAY] ragged_ops ............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]
DeepSpeed general environment info: torch install path ............... ['/home/archlab/zyl/copy/jlq/anaconda3/lib/python3.11/site-packages/torch'] torch version .................... 2.3.0+cu121 deepspeed install path ........... ['/home/archlab/zyl/copy/jlq/project/DeepSpeed/deepspeed'] deepspeed info ................... 0.14.3+0fc19b6a, 0fc19b6a, master torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 12.1 deepspeed wheel compiled w. ...... torch 2.3, cuda 12.1 shared memory (/dev/shm) size .... 125.89 GB
Screenshots
System info (please complete the following information):
- OS: Ubuntu 18.04
- GPU count and types: two machines with x2 V100s each
- (if applicable) what DeepSpeed-MII version are you using deepspeed==0.14.3+0fc19b6a
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions transfomers ==4.40.2 accelerate==0.30.0
- Python version:3.11
- Any other relevant info about your setup
Docker context Are you using a specific docker image that you can share? Not use docker
Additional context And there is error when I run the deepspeedExample in the same environment. run the DeepSpeedExamples/inference/huggingface/text-generation/run-generation-script/test-gpt.sh
Error is
then I changed the parameter in deepspeed.init_inference
model = deepspeed.init_inference(model,
mp_size=1,
dtype=(torch.half if args.fp16 else torch.float),
replace_with_kernel_inject=True)
Then error is
What should I do if I want to use deepspeed in multi-node?
Thanks