awesome-mlops icon indicating copy to clipboard operation
awesome-mlops copied to clipboard

Add GPU-based HPA for Triton Inference Server

Open uzunenes opened this issue 4 months ago • 0 comments

Added a tutorial/guide for deploying NVIDIA Triton Inference Server on Kubernetes with Horizontal Pod Autoscaling based on GPU utilization metrics using DCGM Exporter and Prometheus.

The guide covers:

  • Triton Inference Server deployment
  • GPU metrics collection with DCGM Exporter
  • Prometheus integration for custom metrics
  • HPA configuration for automatic scaling based on GPU utilization

uzunenes avatar Dec 07 '25 11:12 uzunenes