[Serve][1/n] Add autoscaling prometheus metrics

Open abrarsheikh opened this issue 3 weeks ago • 0 comments

https://anyscale-ray--59220.com.readthedocs.build/en/59220/serve/monitoring.html#built-in-ray-serve-metrics

fixes https://github.com/ray-project/ray/issues/59218

docs changes

[x] refactored the table with all metrics, IMO markdown is easier to read in code
[x] split the table of metrics in ordered categories. categories are ordered by typical request path
[x] included a stick diagram of important metrics, show where in the request lifecycle the metric is recorded
[x] order metrics in table by order in request path

Adding the following new metrics

    - ray_serve_deployment_target_replicas: Target number of replicas
        Tags: deployment, application
    - ray_serve_autoscaling_decision_replicas: Raw decision before bounds
        Tags: deployment, application
    - ray_serve_autoscaling_total_requests: Total requests seen by autoscaler
        Tags: deployment, application
    - ray_serve_autoscaling_policy_execution_time_ms: Policy execution time
        Tags: deployment, application, policy_scope
    - ray_serve_autoscaling_replica_metrics_delay_ms: Replica metrics delay
        Tags: deployment, application, replica
    - ray_serve_autoscaling_handle_metrics_delay_ms: Handle metrics delay
        Tags: deployment, application, handle

Dec 06 '25 06:12 abrarsheikh