Ronen Schaffer

Results 9 comments of Ronen Schaffer

@rib-2, I highly value your opinion. Would you please review my pull request?

@simon-mo Could you please review this PR?

I've added 3 new panels at the bottom of the Grafana dashboard showcasing the metrics `vllm:request_prompt_tokens_bucket`, `vllm:request_generation_tokens_bucket` and `vllm:request_success_total`. I haven't included panels for `n` and `best_of` since the current...

I see. I'll do my best to deliver ASAP

1. I've incorporated most of the changes from https://github.com/ronensc/vllm/pull/1 into this PR. 2. I'm not sure the whether [assumption](https://github.com/ronensc/vllm/pull/1/files#diff-9d1cd5050a7ec1e588cae646f3f95ca134cffbd8b41d6655c6a14de70117b869R459-R461) in `maybe_get_last_latency()` and `maybe_set_first_token_time()` is correct. Both methods are called after...

In current state of the PR, some of the metrics are still inaccurate in chunked_prefill. Before addressing the chunked_prefill issue, could we please merge this PR up to commit https://github.com/vllm-project/vllm/pull/2764/commits/5ded719d9cfd09b300562ec1d4df21fb0a4e79a3...

For future reference, the root cause analysis of the test's failure has been conducted by @dgrove-oss, and it can be found here: https://github.com/project-codeflare/multi-cluster-app-dispatcher/pull/691#issuecomment-1832070420

@rib-2 @HMellor I asked about the decision of using `aioprometheus` over the official package in the original PR and here is a link to the response I received https://github.com/vllm-project/vllm/pull/1890#issuecomment-1881491178

@jotak Just to make sure that I understand your configuration format, considering the following values ``` sampling: syn: 0 fin: 10 default: 50 ``` It should be interpreted as: -...