ray icon indicating copy to clipboard operation
ray copied to clipboard

[core] aggregated metrics for `ray_tasks`/`ray_actors`

Open hongchaodeng opened this issue 6 months ago • 0 comments

Description

Currently the the ray_tasks/actors metrics could be of huge volume. This is fine for single cluster. But for aggregated platform view this could be a problem of excessive load on Prometheus & Grafana server.

For these aggregated view, we don't need to know the NAME, WorkerId, etc. But these tags lead to high cardinality in output metrics. Due to the current limit of GAUGE type of these metrics, dropping labels is not ideal either.

We should add a new aggregated metrics for ray_tasks/ray_actors.

Use case

No response

hongchaodeng avatar Aug 23 '24 00:08 hongchaodeng