DeepSpeedExamples
DeepSpeedExamples copied to clipboard
DeepSpeed-FastGen support ascend npu?
DeepSpeed-FastGen support ascend npu, deepseek-r1-distilled-qwen2.5-32b?
@RyanOvO - DeepSpeed supports the Ascend NPU, but I don't believe FastGen has been tested there. @hipudding or @xuedinge233, do you know?
We have not tested it before. Will test it now.
[2025-03-24 06:12:22,548] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to npu (auto detect)
torch npu: True
[2025-03-24 06:12:25,338] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-03-24 06:12:25,338] [INFO] [comm.py:689:init_distributed] Initializing TorchBackend in DeepSpeed with backend hccl
[2025-03-24 06:12:35,567] [INFO] [engine_v2.py:82:__init__] Building model...
...
Config max_tokens=768 type='rms_norm' channels=4096 residual_dtype=torch.bfloat16 input_dtype=torch.bfloat16 output_dtype=torch.bfloat16 eps=1e-05 is not supported by <class 'deepspeed.inference.v2.modules.implementations.pre_norm.cuda_pre_rms.DSPreRMSCUDAModule'>
Currently, Deepspeed-mii only supports CPU and CUDA, NPU does not support it.