awesome-mlops
awesome-mlops copied to clipboard
Add GPU-based HPA for Triton Inference Server
Added a tutorial/guide for deploying NVIDIA Triton Inference Server on Kubernetes with Horizontal Pod Autoscaling based on GPU utilization metrics using DCGM Exporter and Prometheus.
The guide covers:
- Triton Inference Server deployment
- GPU metrics collection with DCGM Exporter
- Prometheus integration for custom metrics
- HPA configuration for automatic scaling based on GPU utilization