metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

feat: add GPU support to metaflow-dev minikube setup

Open cnaples79 opened this issue 4 months ago • 2 comments

Summary

  • Add optional GPU support for minikube in metaflow-dev with intelligent auto-detection
  • Provide manual control via MINIKUBE_ENABLE_GPU environment variable
  • Enable GPU workloads like @resources(gpu=1) in local development environments

Changes Made

  • Auto-detection logic: Detects NVIDIA (nvidia-smi) and AMD (rocm-smi) GPUs automatically
  • Environment variable control: MINIKUBE_ENABLE_GPU=auto|true|false (default: auto)
  • User feedback: Informative messages about GPU detection status during startup
  • Help documentation: Updated help text with environment variable usage
  • Conditional flag addition: Adds --gpus all to minikube start only when appropriate

Modes of Operation

  1. auto (default): Automatically detects GPU availability and enables if found
  2. true: Force enables GPU support regardless of detection
  3. false: Explicitly disables GPU support

Test Plan

  • ✅ Verified Makefile syntax with make help
  • ✅ Tested dry-run with make -n setup-minikube (shows no GPU detected message)
  • ✅ Tested forced enable with MINIKUBE_ENABLE_GPU=true (correctly adds --gpus all flag)
  • ✅ Confirmed help text displays new environment variable documentation

Example Usage

# Auto-detect GPU (default behavior)
make setup-minikube

# Force enable GPU support
MINIKUBE_ENABLE_GPU=true make setup-minikube

# Explicitly disable GPU support  
MINIKUBE_ENABLE_GPU=false make setup-minikube

Fixes #2606

cnaples79 avatar Sep 17 '25 01:09 cnaples79

Thanks for the feedback @feltech! I've updated the implementation to address the Docker compatibility concerns:

Changes Made

🔧 Improved Docker Compatibility:

  • Default to --devices nvidia.com/gpu=all for NVIDIA GPUs (more compatible with different Docker configurations)
  • Keep --gpus all for AMD/other GPUs
  • This addresses the NixOS Docker issue you mentioned

⚙️ Enhanced Control Options: Added MINIKUBE_GPU_FLAG environment variable for explicit control:

  • auto (default): Smart selection based on GPU type
  • gpus: Force --gpus all format
  • devices: Force --devices nvidia.com/gpu=all format
  • Custom value: User-provided flag (e.g., --devices nvidia.com/gpu=2)

Example Usage

# Auto-detect best GPU flag (default)
make setup-minikube

# Force devices format (good for Docker compatibility issues)
MINIKUBE_GPU_FLAG=devices make setup-minikube

# Force legacy gpus format
MINIKUBE_GPU_FLAG=gpus make setup-minikube

# Custom GPU specification
MINIKUBE_GPU_FLAG="--devices nvidia.com/gpu=2" make setup-minikube

This should resolve the Docker configuration compatibility issues while maintaining flexibility for different setups. Let me know if this addresses your concerns!

cnaples79 avatar Sep 17 '25 12:09 cnaples79

Thanks for the clarification, and you're absolutely right — minikube doesn't support --devices. I've updated the PR to remove the --devices path and always pass --gpus all to minikube start when GPU is detected or forced via MINIKUBE_ENABLE_GPU=true.

Summary of changes:

  • Remove MINIKUBE_GPU_FLAG and the --devices nvidia.com/gpu=all path
  • Keep simple/valid --gpus all for minikube
  • Preserve auto‑detection and MINIKUBE_ENABLE_GPU env var controls

If you want me to also document the separate Docker CLI considerations (for folks not using minikube), I can add a short note in the devtools help.

cnaples79 avatar Sep 17 '25 21:09 cnaples79