DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[Feature package] Full feature support with Ascend NPU

Open hipudding opened this issue 2 years ago • 2 comments

Background

Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. For more information about Ascend, see Ascend Community.

CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI.

Pytorch has officially announced support for Ascend NPU (through key PrivateUse1), please see the PrivateUse1 tutorial here.

Previous work

NPU accelerator support has already been merged (see #3595, #3831), which makes it possible to use NPU as a backend accelerator for basic training and inferencing tasks. However, to achieve full support, more features need to be implemented.

Sub tasks

Here is a list of features that need to be implemented or tested.

status title assigned to
Done Unit tests @RUAN-ZX
Done FP16 @minchao-sun, @wuhhu
Done BF16 @minchao-sun, @wuhhu
Done Gradient Accumulation @minchao-sun, @wuhhu
Done Data Parallelism @minchao-sun, @wuhhu
Done Pipeline Parallelism @RUAN-ZX
Done Zero1 @misstek
Done Zero2 @misstek
Done Zero3 @misstek
Done Activation Checkpointing @CurryRice233
Done Fused Adam @CurryRice233
Done Mixture of Experts (MoE) @wangshuai09
Processing RLHF @wangshuai09 @CurryRice233
Done ZeRO Offload @hipudding
Processing ZeRO Infinity @misstek
Done 1-bit Adam @RUAN-ZX
Done 1-bit LAMB @RUAN-ZX
Done 0/1 Adam @minchao-sun
Processing Curriculum Learning @minchao-sun
Processing Layer Dropping @minchao-sun

hipudding avatar Oct 26 '23 02:10 hipudding

I see npu FusedAdam is implemented with torch_npu.npu_apply_adam_w. In the future when implement new features, does NPU intend to support through torch_npu, or may also implement kernel in DeepSpeed as well?

delock avatar Nov 22 '23 07:11 delock

I see npu FusedAdam is implemented with torch_npu.npu_apply_adam_w. In the future when implement new features, does NPU intend to support through torch_npu, or may also implement kernel in DeepSpeed as well?

The NPU supports two modes. Personally, I prefer the first one, where users can directly invoke the interface, regardless of the implementation

CurryRice233 avatar Nov 22 '23 08:11 CurryRice233