hipBLASLt
hipBLASLt copied to clipboard
[Draft] Distributed tuning
[Draft] TensileParallel Documentation
Overview
TensileParallel is an enhancement to the original Tensile tuning tool that enables parallel tuning across multiple GPU devices. It optimizes the tuning process by distributing workloads across available GPUs, significantly reducing the total tuning time.
Features
- Multi-GPU support for parallel tuning
- Automatic workload distribution and load balancing
- Fallback mechanism to standard Tensile execution
- Comprehensive logging and error handling
- Automatic results merging from multiple devices
Prerequisites
- ROCm environment with hipBLASLt installed
- Python 3.x
- Multiple AMD GPU devices (optional)
Installation
No additional installation required beyond the standard hipBLASLt setup.
Usage
Basic Command
cd /hipBLASLt/tensilelite
./Tensile/bin/TensileParallel <config.yaml> <output_path>
Configuration for Device Selection
Modify your config.yaml to specify GPU devices using the DeviceList parameter under GlobalParameters:
- Specific Devices
GlobalParameters:
...
DeviceList: [1, 2, 3] # Use GPUs 1, 2, and 3
- All Available Devices
GlobalParameters:
...
DeviceList: [-1] # Use all available GPUs
- Default Behavior
- If DeviceList is not specified or empty, falls back to standard Tensile execution
- If specified devices are unavailable, falls back to standard Tensile execution
Execution Flow
- Configuration Loading
- Validates input configuration
- Checks device availability
- Workload Distribution
- Analyzes problem sizes
- Distributes workload based on complexity
- Generates device-specific configurations
- Parallel Execution
- Runs tuning processes on selected devices
- Monitors progress and handles errors
- Results Processing
- Merges results from all devices
- Generates execution summary
- Creates consolidated output
Output Structure
output_path/
├── config_gpu_*.yaml # Device-specific configurations
├── outputs/
│ ├── gpu_0/ # Results from GPU 0
│ ├── gpu_1/ # Results from GPU 1
│ └── ...
├── merged_output/ # Final merged results
Please resolve merge conflicts or close this PR to complete the task of importing PRs from this repo to the monorepo.