text-generation-inference CPU and Memory Utilization for TGI

Feature request

This is a request for exposing the cpu and memory utilization metrics for TGI. This will be helpful to autoscale when the load reaches a certain limit.

Also, can anyone help me with the list of metrics provided, I couldn't find it here, https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/metrics

Motivation

<>

Your contribution

<>

Jul 12 '24 13:07 snps-ravinu

Hi @snps-ravinu , thanks for your feedback! For CPU and memory utilization, it is probably better to use those from the container runtime (if using k,8s, that would be metrics-server for autoscaling). Is there any particular reason you would need TGI to report those?

We plan to document the metrics exposed by TGI to make them easier to use.

Jul 15 '24 08:07 Hugoch

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Aug 15 '24 01:08 github-actions[bot]