text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

CPU and Memory Utilization for TGI

Open snps-ravinu opened this issue 1 year ago • 1 comments

Feature request

This is a request for exposing the cpu and memory utilization metrics for TGI. This will be helpful to autoscale when the load reaches a certain limit.

Also, can anyone help me with the list of metrics provided, I couldn't find it here, https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/metrics

Motivation

<>

Your contribution

<>

snps-ravinu avatar Jul 12 '24 13:07 snps-ravinu

Hi @snps-ravinu , thanks for your feedback! For CPU and memory utilization, it is probably better to use those from the container runtime (if using k,8s, that would be metrics-server for autoscaling). Is there any particular reason you would need TGI to report those?

We plan to document the metrics exposed by TGI to make them easier to use.

Hugoch avatar Jul 15 '24 08:07 Hugoch

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 15 '24 01:08 github-actions[bot]