torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

Add Ascend NPU as a backend

Open noemotiovon opened this issue 1 year ago • 4 comments

Description

:rocket:Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. For more information about Ascend, see Ascend Community.

CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI.

Pytorch has officially announced support for Ascend NPU (through key PrivateUse1), please see the PrivateUse1 tutorial here.


Motivation

With the growing number of developers leveraging Ascend NPUs for AI training and inference, I would like to propose adding support for the Ascend NPU backend to this project.


noemotiovon avatar Oct 10 '24 07:10 noemotiovon

Hi @noemotiovon thanks for creating the issue. If I understand correctly one gap is to import torch_npu to access the NPU backend, is that correct? Looking at the Ascend/pytorch repo I only see up to version 2.3 supported on main (ref), while we only support the latest stable version of PyTorch (2.4 and soon to be 2.5). Do you know if these more recent versions are currently support in Ascend?

ebsmothers avatar Oct 10 '24 12:10 ebsmothers

Hi @noemotiovon thanks for creating the issue. If I understand correctly one gap is to import torch_npu to access the NPU backend, is that correct? Looking at the Ascend/pytorch repo I only see up to version 2.3 supported on main (ref), while we only support the latest stable version of PyTorch (2.4 and soon to be 2.5). Do you know if these more recent versions are currently support in Ascend?

@ebsmothers Thank you very much for taking the time to review my issue :smile: ! Currently, torch-npu 2.4.0rc1 has been released, and we can use it for testing.(ref) You’re correct, we just need to import torch_npu based on the device. And will you accept a PR that use this version(2.4.0rc1) of torch_npu for testing and verification? :smiley:

noemotiovon avatar Oct 10 '24 13:10 noemotiovon

@noemotiovon yep if you open a PR I am happy to review it!

ebsmothers avatar Oct 10 '24 13:10 ebsmothers

fyi looks like this has been requested in the past, but never followed up on: https://github.com/pytorch/torchtune/issues/1006

RdoubleA avatar Oct 10 '24 14:10 RdoubleA

Closed by #1826

joecummings avatar Dec 13 '24 13:12 joecummings