CPU and Memory Utilization for TGI
Feature request
This is a request for exposing the cpu and memory utilization metrics for TGI. This will be helpful to autoscale when the load reaches a certain limit.
Also, can anyone help me with the list of metrics provided, I couldn't find it here, https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/metrics
Motivation
<>
Your contribution
<>
Hi @snps-ravinu , thanks for your feedback!
For CPU and memory utilization, it is probably better to use those from the container runtime (if using k,8s, that would be metrics-server for autoscaling). Is there any particular reason you would need TGI to report those?
We plan to document the metrics exposed by TGI to make them easier to use.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.