DeepSpeed [Question]how to run the mixtral inference in multi-node?

[Question]how to run the mixtral inference in multi-node?

Open leachee99 opened this issue 9 months ago • 0 comments

Describe the bug The program is killed by timeout of watchdog when I run deepspeed on mutli-node.

To Reproduce Steps to reproduce the behavior: my code

Simple inference script to reproduce

   deepspeed \
    --hostfile=./hostfile \
    --include="node0:2,3@node1:0,1" \
    mixtralDs.py \
    --deepspeed_config ./ds_config.json

mixtralDs.py

def run_mixtral_ds():
    local_rank = int(os.environ["LOCAL_RANK"])
    torch.cuda.set_device(local_rank)
 
    device = torch.device("cuda",local_rank)
    configuration = MixtralConfig(vocab_size=32000,
            hidden_size=4096//2,
            intermediate_size=14336//2,
            num_hidden_layers=32//2,
            num_attention_heads=32//2,
            num_key_value_heads=8//2,
            hidden_act="silu",
            max_position_embeddings=(4096) * (32),
            initializer_range=0.02,
            rms_norm_eps=1e-5,
            use_cache=True,
            pad_token_id=None,
            bos_token_id=1,
            eos_token_id=2,
            tie_word_embeddings=False,
            rope_theta=1e6,
            sliding_window=None,
            attention_dropout=0.0,
            num_experts_per_tok=2,
            num_local_experts=8,
            output_router_logits=False,
            router_aux_loss_coef=0.001)

    mixtralmodel = MixtralModel(config = configuration).to(device)
    inputs_ids = torch.randint(
        low=0,high=configuration.vocab_size,size=(4,30)
    ).to(device)        
    ds_model = deepspeed.init_inference(mixtralmodel,mp_size=4, dtype=torch.float16)
    res = ds_model(inputs_ids)
    print(res)

if __name__ == '__main__':
    run_mixtral_ds()

Expected behavior The program run in multi-node and print a result.

ds_report output

Please run ds_report to give us details about your setup.

DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import('pkg_resources').require('deepspeed==0.14.3+0fc19b6a') [2024-05-16 21:46:15,312] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-05-16 21:46:15,444] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

[WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] cpu_lion ............... [NO] ....... [OKAY] [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH evoformer_attn ......... [NO] ....... [NO] [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures fp_quantizer ........... [NO] ....... [NO] fused_lamb ............. [NO] ....... [OKAY] fused_lion ............. [NO] ....... [OKAY] inference_core_ops ..... [NO] ....... [OKAY] cutlass_ops ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] ragged_device_ops ...... [NO] ....... [OKAY] ragged_ops ............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3 [WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/home/archlab/zyl/copy/jlq/anaconda3/lib/python3.11/site-packages/torch'] torch version .................... 2.3.0+cu121 deepspeed install path ........... ['/home/archlab/zyl/copy/jlq/project/DeepSpeed/deepspeed'] deepspeed info ................... 0.14.3+0fc19b6a, 0fc19b6a, master torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 12.1 deepspeed wheel compiled w. ...... torch 2.3, cuda 12.1 shared memory (/dev/shm) size .... 125.89 GB

Screenshots

System info (please complete the following information):

OS: Ubuntu 18.04
GPU count and types: two machines with x2 V100s each
(if applicable) what DeepSpeed-MII version are you using deepspeed==0.14.3+0fc19b6a
(if applicable) Hugging Face Transformers/Accelerate/etc. versions transfomers ==4.40.2 accelerate==0.30.0
Python version:3.11
Any other relevant info about your setup

Docker context Are you using a specific docker image that you can share? Not use docker

Additional context And there is error when I run the deepspeedExample in the same environment. run the DeepSpeedExamples/inference/huggingface/text-generation/run-generation-script/test-gpt.sh

Error is then I changed the parameter in deepspeed.init_inference

model = deepspeed.init_inference(model,
              mp_size=1,
              dtype=(torch.half if args.fp16 else torch.float),
              replace_with_kernel_inject=True)

Then error is

What should I do if I want to use deepspeed in multi-node?

Thanks

May 17 '24 08:05 leachee99

DeepSpeed DeepSpeed copied to clipboard

[Question]how to run the mixtral inference in multi-node?

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

DeepSpeed
DeepSpeed copied to clipboard