Ronen Schaffer comments

Results 16 comments of


                                            Ronen Schaffer

Add more Prometheus metrics

@rib-2, I highly value your opinion. Would you please review my pull request?

Add more Prometheus metrics

@simon-mo Could you please review this PR?

I've added 3 new panels at the bottom of the Grafana dashboard showcasing the metrics `vllm:request_prompt_tokens_bucket`, `vllm:request_generation_tokens_bucket` and `vllm:request_success_total`. I haven't included panels for `n` and `best_of` since the current...

Add more Prometheus metrics

I see. I'll do my best to deliver ASAP

Add more Prometheus metrics

1. I've incorporated most of the changes from https://github.com/ronensc/vllm/pull/1 into this PR. 2. I'm not sure the whether [assumption](https://github.com/ronensc/vllm/pull/1/files#diff-9d1cd5050a7ec1e588cae646f3f95ca134cffbd8b41d6655c6a14de70117b869R459-R461) in `maybe_get_last_latency()` and `maybe_set_first_token_time()` is correct. Both methods are called after...

Add more Prometheus metrics

In current state of the PR, some of the metrics are still inaccurate in chunked_prefill. Before addressing the chunked_prefill issue, could we please merge this PR up to commit https://github.com/vllm-project/vllm/pull/2764/commits/5ded719d9cfd09b300562ec1d4df21fb0a4e79a3...

Ronen Schaffer

Add more Prometheus metrics

Add more Prometheus metrics

Add more Prometheus metrics

Add more Prometheus metrics

Add more Prometheus metrics

Add more Prometheus metrics

Skipping MCAD CPU Preemption Test

Rs/feature/metrics

TCP flag-based sampling ("Smart sampling")

Add `go vet` to the Makefile and fix its errors