verl [Hardware] Add support for Huawei Ascend NPU

Single Controller:
- Change placement group resources from GPU to NPU
- Made modifications to integrate Huawei’s HCCL
Megatron:
- Adapte Megatron to Huawei Ascend NPU using MindSpeed，and upgrade Megatron to version 0.6.0 to comply with MindSpeed’s requirements.
- Adapte Megatron-core 0.6.0’s ParamAndGradBuffer when synchronizing the weights between Megatron-LM and vLLM
- Replace operators in ParallelLlamaModel, including RMSNORM, flash attention, ROPE, and pad/unpad.
vLLM:
- Use this PR for vLLM Ascend support.
- Add the SPMD version of vLLM 0.6.4post1

Feb 04 '25 17:02 Chendong98

Just wonder does FSDP backend work with NPU?

Feb 05 '25 02:02 vermouth1992

Just wonder does FSDP backend work with NPU?

The FSDP backend can work with NPU, but there are two issues to be addressed:

torch.logsumexp does not support bf16 on NPU;
FlashAttention-2 should be disabled in Transformers.

Feb 06 '25 07:02 Chendong98

got ModuleNotFoundError: No module named 'flash_attn' error

Feb 10 '25 04:02 huangk10

The code throws this error after commenting out the flash atten import. “”“ work = group.broadcast([tensor], opts) RuntimeError: create:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:91 HCCL function error: HcclCommInitRootInfo(numRanks, &rootInfo, rank, &(comm->hcclComm_)), error code is 2 [ERROR] 2025-02-10-19:56:18 (PID:1057704, Device:0, RankID:1) ERR02200 DIST call hccl api failed. ”“”

Feb 10 '25 11:02 huangk10

The code throws this error after commenting out the flash atten import. “”“ work = group.broadcast([tensor], opts) RuntimeError: create:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:91 HCCL function error: HcclCommInitRootInfo(numRanks, &rootInfo, rank, &(comm->hcclComm_)), error code is 2 [ERROR] 2025-02-10-19:56:18 (PID:1057704, Device:0, RankID:1) ERR02200 DIST call hccl api failed. ”“”

Thank you for your feedback! We've addressed the issue you mentioned in the latest commit.

Feb 10 '25 15:02 Chendong98

work = group.broadcast([tensor], opts) RuntimeError: create:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:91 HCCL function error: HcclCommInitRootInfo(numRanks, &rootInfo, rank, &(comm->hcclComm_)), error code is 2 [ERROR] 2025-02-10-19:56:18 (PID:1057704, Device:0, RankID:1) ERR02200 DIST call hccl api failed.

The code throws this error after commenting out the flash atten import. “”“ work = group.broadcast([tensor], opts) RuntimeError: create:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:91 HCCL function error: HcclCommInitRootInfo(numRanks, &rootInfo, rank, &(comm->hcclComm_)), error code is 2 [ERROR] 2025-02-10-19:56:18 (PID:1057704, Device:0, RankID:1) ERR02200 DIST call hccl api failed. ”“”

Thank you for your feedback! We've addressed the issue you mentioned in the latest commit.

it works

Feb 11 '25 00:02 huangk10

Dude, Do you work in Huawei? Hope to contact with you~