DeepSpeedExamples DeepSpeed-FastGen support ascend npu?

DeepSpeed-FastGen support ascend npu, deepseek-r1-distilled-qwen2.5-32b?

Mar 04 '25 14:03 RyanOvO

@RyanOvO - DeepSpeed supports the Ascend NPU, but I don't believe FastGen has been tested there. @hipudding or @xuedinge233, do you know?

Mar 13 '25 21:03 loadams

We have not tested it before. Will test it now.

Mar 14 '25 00:03 hipudding

[2025-03-24 06:12:22,548] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to npu (auto detect)
torch npu:  True
[2025-03-24 06:12:25,338] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-24 06:12:25,338] [INFO] [comm.py:689:init_distributed] Initializing TorchBackend in DeepSpeed with backend hccl
[2025-03-24 06:12:35,567] [INFO] [engine_v2.py:82:__init__] Building model...
...
Config max_tokens=768 type='rms_norm' channels=4096 residual_dtype=torch.bfloat16 input_dtype=torch.bfloat16 output_dtype=torch.bfloat16 eps=1e-05 is not supported by <class 'deepspeed.inference.v2.modules.implementations.pre_norm.cuda_pre_rms.DSPreRMSCUDAModule'>

Currently, Deepspeed-mii only supports CPU and CUDA, NPU does not support it.

Mar 24 '25 06:03 xuedinge233