DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Add README Pipeline Status for Huawei Ascend NPU

Open xuedinge233 opened this issue 1 year ago • 4 comments

Hello! Following the merge of https://github.com/microsoft/DeepSpeed/pull/6445, I have implemented a CI pipeline to validate the Huawei Ascend NPU.

xuedinge233 avatar Sep 29 '24 02:09 xuedinge233

Summary

Implemented CI for the Huawei Ascend NPU to automate unit testing and synchronize test results in the Pipeline Status tables.

Motivation

While the Huawei Ascend NPU usage documentation has been updated, there is a lack of clarity regarding the operational process.

Implementation

The CI process includes the following steps:

  • Set CI triggers for daily scheduled runs and upon receiving pull requests.
  • Use ascendai/cann Docker image to run the basic lift environment.
  • Install the necessary libraries for PyTorch and DeepSpeed, and verify the environment post-installation.
  • Execute test files, including accelerator autotuning, located in tests/unit for unit testing.
  • Update the README.md with the latest test results.

xuedinge233 avatar Sep 30 '24 07:09 xuedinge233

Below are some screenshots of the CI run image image image

xuedinge233 avatar Sep 30 '24 07:09 xuedinge233

@xuedinge233 - can you please address the formatting test failure?

loadams avatar Oct 07 '24 17:10 loadams

@xuedinge233 - can you please address the formatting test failure?

Yes, I have already fixed the bug

xuedinge233 avatar Oct 08 '24 06:10 xuedinge233