pytorch-lightning Add Ascend NPU as a backend

Description & Motivation

Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. For more information about Ascend, see Ascend Community.

CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI.

Pytorch has officially announced support for Ascend NPU (through key PrivateUse1), please see the PrivateUse1 tutorial here.

Provide new backend support for pytorch-lighting, allowing users who use Ascend NPU to also use the convenient development and acceleration capabilities provided by pytorch-lighting.

Pitch

I'd like to add a new accelerator, and register it into Accelerators, which make it possible that customer can use this new backend by: "accelerator='npu'".

Alternatives

No response

Additional context

I have wrote a demo, please refer to: https://github.com/Lightning-AI/pytorch-lightning/pull/19308

cc @borda

Feb 20 '24 02:02 hipudding

@lantiga Good day, Could you please give me some suggestions? Thanks.

Feb 26 '24 06:02 hipudding

@hipudding have you managed how to solve this issue?

Apr 05 '24 19:04 andre0rlandi

@hipudding have you managed how to solve this issue?

Not yet. I have create a Draft PR, which is a demo for this new backend. I need reply from community, This will guide my subsequent development tasks. However, I haven't gotten any response from the community yet, this issue is currently pending.

Apr 07 '24 09:04 hipudding

@hipudding

Hello,

I have recently utilized the draft code here for training with PyTorch Lightning on NPU. It has proven to be quite useful, and I would like to extend my gratitude for your contribution. I am hopeful that the pending pull request (PR) can be reviewed and merged into the official repository promptly.

Thank you once again for your valuable work.

Best regards

May 10 '24 09:05 fandengdong

Thank you once again for your valuable work.

i think i could be made as extension repo

May 10 '24 09:05 Borda

i think i could be made as extension repo

@Borda Thanks for your reply. Could you please tell me how to do with this extension repo? Is there any guidance on extension development? Or just use monkey patch.

May 11 '24 01:05 hipudding

i think i could be made as extension repo

@Borda Thanks for your reply. Could you please tell me how to do with this extension repo? Is there any guidance on extension development? Or just use monkey patch.

if I wanna use lightning in NPU Ascend now, could you give me some draft templates or suggestions ? Thanks！

Jun 26 '24 03:06 XXXHUA

i think i could be made as extension repo

@Borda Thanks for your reply. Could you please tell me how to do with this extension repo? Is there any guidance on extension development? Or just use monkey patch.

if I wanna use lightning in NPU Ascend now, could you give me some draft templates or suggestions ? Thanks！

Yes, here's the demo.

Jun 26 '24 03:06 hipudding

Hi, @hipudding, Any update now? I have used your draft code for training. I found that although it runs pretty well with minimal changes, the speed is quite slow (910B vs V100). I wonder if the problem stems from torch_npu or the lightning framework. Thank you in advance.

Jul 09 '24 16:07 RobertLuo1

Hi, @hipudding, Any update now? I have used your draft code for training. I found that although it runs pretty well with minimal changes, the speed is quite slow (910B vs V100). I wonder if the problem stems from torch_npu or the lightning framework. Thank you in advance.

Thanks for using this PR. Actually, I didn't do any analysis about the performace, If you are not using any strategy, it only a simple wrapper of torch(torch_npu), I think you can use a simple demo that only use torch(torch_npu) to see where is the root cause of this performance issue.

Jul 10 '24 07:07 hipudding

Thanks for your quick reply! I have found something strange when I train the autoencoder, the speed is normal and 2x better than V100. However, when I train the Transformer-based model, the speed is extremely slow (10x slower).

Jul 10 '24 13:07 RobertLuo1

Thanks for your quick reply! I have found something strange when I train the autoencoder, the speed is normal and 2x better than V100. However, when I train the Transformer-based model, the speed is extremely slow (10x slower).

Sorry, I'm not an expert in this area and I can't give you any advice.

Jul 11 '24 11:07 hipudding

we are trying to evaluate possibility of running pl on huawei NPU. Thanks for your contribution. I agree with @Borda that making it an extension repo would be better.

2 example with be
https://github.com/Lightning-AI/lightning-Graphcore https://github.com/Lightning-AI/lightning-Habana

We are currently pending procure the hardware. Looking forward for your opinion.

Sep 27 '24 02:09 24hours

we are trying to evaluate possibility of running pl on huawei NPU. Thanks for your contribution. I agree with @Borda that making it an extension repo would be better.

2 example with be https://github.com/Lightning-AI/lightning-Graphcore https://github.com/Lightning-AI/lightning-Habana

We are currently pending procure the hardware. Looking forward for your opinion.

Thanks. We will try to make it as a extension repo.

Sep 29 '24 02:09 hipudding