Mark McLoughlin
Mark McLoughlin
I thought it was about time to update on the latest status of this and note some TODOs. ### Status The v1 engine frontend API server now has a Prometheus-compatible...
As a bit of a status update, here's how the example Grafana dashboard currently looks with a serving benchmark run like this: ``` $ python3 ./benchmarks/benchmark_serving.py --model meta-llama/Llama-3.1-8B-Instruct --tokenizer meta-llama/Llama-3.1-8B-Instruct...
What's nice about the above is that even though V1 does not have `vllm:num_requests_swapped` and `vllm:cpu_cache_usage_perc` (because V1 doesn't have swap-to-CPU preemption mode), it doesn't impact the user experience of...
Here's the latest on what's in V0 versus V1: | In Both | In V0 Only | In V1 Only | |---------|-----------|-----------| | vllm:cache_config_info | vllm:cpu_cache_usage_perc #14136 | vllm:gpu_prefix_cache_hits #12592...
Great catch! Yes indeed, `num_gpu_blocks` isn't currently available in the frontend, so we need some way of getting it from the engine We currently return `gpu_cache_usage` in `SchedulerStats`: ``` @dataclass...
> I haven't reviewed closely, just added a few comments of things that I noticed. > > > @njhill is there any implications from #15977 we should consider? This is...
> When the metrics were originally added the contributor wasn't aware of the naming convention. When this was noticed it was decided that we would leave them as is so...
> This is a really valuable document, thank you for putting the time and effort into creating it! Thank you! > A couple of whitespace nits. > > My comments...
> You can cherrypick this commit [hmellor@1876f9c](https://github.com/hmellor/vllm/commit/1876f9cc5123f74239779d38005dc80f0c7552c3) Thanks @hmellor appreciate the help! TIL MyST!
pre-commit failure is a `yapf` failure that doesn't happen for me locally: ``` File "/home/runner/.cache/pre-commit/repom9gt4aao/py_env-python3.12/lib/python3.12/site-packages/yapf_third_party/_ylib2to3/pygram.py", line 39, in pattern_grammar = driver.load_grammar(_PATTERN_GRAMMAR_FILE) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.cache/pre-commit/repom9gt4aao/py_env-python3.12/lib/python3.12/site-packages/yapf_third_party/_ylib2to3/pgen2/driver.py", line 248, in load_grammar g.load(gp) File...