gpu-operator
gpu-operator copied to clipboard
Enhance MIG Support Detection for NVIDIA GPUs introduced in R580
Summary
This PR significantly improves the Multi-Instance GPU (MIG) capability detection logic in the NVIDIA GPU Operator by expanding the list of supported GPU architectures and implementing a more comprehensive pattern-matching approach.
Changes Made
1. Enhanced MIG Detection Logic (controllers/state_manager.go)
- Refactored the
hasMIGCapableGPUfunction to use a dedicated helper functionisMIGCapableGPUProduct - Expanded MIG support from 3 basic models to comprehensive architecture coverage
- Implemented structured pattern matching with clear architectural categorization
2. Comprehensive GPU Architecture Support
The updated detection now supports:
Hopper Architecture (Data Center)
- H100, H800, H200, H20,GH200
Ampere Architecture
- A100, A800, A30
Blackwell Architecture (Next Generation)
- GB200, B200, GB300, B300
Professional Workstation GPUs
- RTX PRO 6000
- RTX PRO 5000
- Dual format support: Both "rtx-pro-6000" and "rtx pro 6000" naming conventions
Verification and Testing
Test Coverage
- All supported GPU models across architectures
- Multiple naming format variations
- Negative test cases for non-MIG GPUs (T4, V100)
- Edge cases (empty strings, partial matches)