Mark McLoughlin comments

Results 50 comments of


                                            Mark McLoughlin

[Feature][v1]: Add metrics support

I thought it was about time to update on the latest status of this and note some TODOs. ### Status The v1 engine frontend API server now has a Prometheus-compatible...

[Feature][v1]: Add metrics support

As a bit of a status update, here's how the example Grafana dashboard currently looks with a serving benchmark run like this: ``` $ python3 ./benchmarks/benchmark_serving.py --model meta-llama/Llama-3.1-8B-Instruct --tokenizer meta-llama/Llama-3.1-8B-Instruct...

[Feature][v1]: Add metrics support

What's nice about the above is that even though V1 does not have `vllm:num_requests_swapped` and `vllm:cpu_cache_usage_perc` (because V1 doesn't have swap-to-CPU preemption mode), it doesn't impact the user experience of...

[Feature][v1]: Add metrics support

Here's the latest on what's in V0 versus V1: | In Both | In V0 Only | In V1 Only | |---------|-----------|-----------| | vllm:cache_config_info | vllm:cpu_cache_usage_perc #14136 | vllm:gpu_prefix_cache_hits #12592...

[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None

Great catch! Yes indeed, `num_gpu_blocks` isn't currently available in the frontend, so we need some way of getting it from the engine We currently return `gpu_cache_usage` in `SchedulerStats`: ``` @dataclass...

[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None

> I haven't reviewed closely, just added a few comments of things that I noticed. > > > @njhill is there any implications from #15977 we should consider? This is...

[v1][Metrics] Add design doc

> When the metrics were originally added the contributor wasn't aware of the naming convention. When this was noticed it was decided that we would leave them as is so...

[v1][Metrics] Add design doc

> This is a really valuable document, thank you for putting the time and effort into creating it! Thank you! > A couple of whitespace nits. > > My comments...

[v1][Metrics] Add design doc

> You can cherrypick this commit [hmellor@1876f9c](https://github.com/hmellor/vllm/commit/1876f9cc5123f74239779d38005dc80f0c7552c3) Thanks @hmellor appreciate the help! TIL MyST!

[V1][Metrics] Handle preemptions

pre-commit failure is a `yapf` failure that doesn't happen for me locally: ``` File "/home/runner/.cache/pre-commit/repom9gt4aao/py_env-python3.12/lib/python3.12/site-packages/yapf_third_party/_ylib2to3/pygram.py", line 39, in pattern_grammar = driver.load_grammar(_PATTERN_GRAMMAR_FILE) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.cache/pre-commit/repom9gt4aao/py_env-python3.12/lib/python3.12/site-packages/yapf_third_party/_ylib2to3/pgen2/driver.py", line 248, in load_grammar g.load(gp) File...