Add Comprehensive Support for Ascend 910B4-1 GPU

Open asianuxchina opened this issue 3 months ago • 0 comments

What would you like to be added: Add native support for Ascend 910B4-1 GPU in HAMi (Hybrid AI Management Interface), including device discovery, resource monitoring, job scheduling optimization, and compatibility with Ascend-specific AI frameworks.

What type of PR is this? /kind feature

What this PR does / why we need it: This PR enables HAMi to fully recognize, manage, and optimize workloads for Ascend 910B4-1 GPU. It includes implementing device driver compatibility checks, integrating Ascend-specific metrics collection (e.g., compute utilization, memory bandwidth), optimizing job scheduling algorithms for Ascend architecture, and ensuring seamless integration with Ascend Tensor Engine and CANN toolkit. This support is critical for HAMi users to efficiently leverage Ascend 910B4-1's AI acceleration capabilities in distributed computing environments.

Which issue(s) this PR fixes: Fixes #(please fill in the corresponding issue number if applicable)

Special notes for your reviewer:

Need to validate HAMi's device plugin for Ascend 910B4-1 against latest CANN versions
Ensure proper handling of Ascend 910B4-1's multi-core architecture in resource allocation
Test job deployment workflows with Ascend-optimized TensorFlow/PyTorch frameworks
Verify compatibility with Ascend Docker Runtime for containerized workload management in HAMi

Does this PR introduce a user-facing change?: Yes, users will be able to select Ascend 910B4-1 GPU as a compute resource in HAMi, monitor its performance metrics, and run AI workloads optimized for this hardware with improved resource utilization.

Sep 10 '25 07:09 asianuxchina