Add GPU-based HPA for Triton Inference Server

Open uzunenes opened this issue 4 months ago • 0 comments

Added a tutorial/guide for deploying NVIDIA Triton Inference Server on Kubernetes with Horizontal Pod Autoscaling based on GPU utilization metrics using DCGM Exporter and Prometheus.

The guide covers:

Triton Inference Server deployment
GPU metrics collection with DCGM Exporter
Prometheus integration for custom metrics
HPA configuration for automatic scaling based on GPU utilization

Dec 07 '25 11:12 uzunenes