[Draft] Distributed tuning

Open brianchooou opened this issue 1 year ago • 1 comments

[Draft] TensileParallel Documentation

Overview

TensileParallel is an enhancement to the original Tensile tuning tool that enables parallel tuning across multiple GPU devices. It optimizes the tuning process by distributing workloads across available GPUs, significantly reducing the total tuning time.

Features

Multi-GPU support for parallel tuning
Automatic workload distribution and load balancing
Fallback mechanism to standard Tensile execution
Comprehensive logging and error handling
Automatic results merging from multiple devices

Prerequisites

ROCm environment with hipBLASLt installed
Python 3.x
Multiple AMD GPU devices (optional)

Installation

No additional installation required beyond the standard hipBLASLt setup.

Usage

Basic Command

cd /hipBLASLt/tensilelite
./Tensile/bin/TensileParallel <config.yaml> <output_path>

Configuration for Device Selection

Modify your config.yaml to specify GPU devices using the DeviceList parameter under GlobalParameters:

Specific Devices

GlobalParameters:
	...
  DeviceList: [1, 2, 3]  # Use GPUs 1, 2, and 3

All Available Devices

GlobalParameters:
	...
  DeviceList: [-1]  # Use all available GPUs

Default Behavior

If DeviceList is not specified or empty, falls back to standard Tensile execution
If specified devices are unavailable, falls back to standard Tensile execution

Execution Flow

Configuration Loading

Validates input configuration
Checks device availability

Workload Distribution

Analyzes problem sizes
Distributes workload based on complexity
Generates device-specific configurations

Parallel Execution

Runs tuning processes on selected devices
Monitors progress and handles errors

Results Processing

Merges results from all devices
Generates execution summary
Creates consolidated output

Output Structure

output_path/
├── config_gpu_*.yaml        # Device-specific configurations
├── outputs/
│   ├── gpu_0/              # Results from GPU 0
│   ├── gpu_1/              # Results from GPU 1
│   └── ...
├── merged_output/          # Final merged results

Dec 10 '24 04:12 brianchooou

Please resolve merge conflicts or close this PR to complete the task of importing PRs from this repo to the monorepo.

Jun 20 '25 18:06 jayhawk-commits