vllm [CI/Build][v1] vLLM v1 automatic benchmarking

This PR extends the performance benchmark to include both v0 and v1. The latency, throughput, and fixed-QPS serving tests will first run with v0 and then with v1. The results for v1 will be recorded and processed in the same way as v0. The filenames will have _v1 appended as a suffix.

The file structure will look like this:

results/
|--- benchmark_results.json
|--- benchmark_results.md
|--- benchmark_results_v1.json
|--- benchmark_results_v1.md
|--- latency_llama8B_tp1.commands
|--- latency_llama8B_tp1.json
|--- latency_llama8B_tp1_v1.commands
|--- latency_llama8B_tp1_v1.json
|--- ...
|--- serving_llama8B_tp1_sharegpt_qps_1.commands
|--- serving_llama8B_tp1_sharegpt_qps_1.json
|--- serving_llama8B_tp1_sharegpt_qps_1_v1.commands
|--- serving_llama8B_tp1_sharegpt_qps_1_v1.json
|--- ...
|--- throughput_llama8B_tp1.commands
|--- throughput_llama8B_tp1.json
|--- throughput_llama8B_tp1_v1.commands
|--- throughput_llama8B_tp1_v1.json
|--- ...

benchmark_results.json and benchmark_results.md will contain the results of the v0 tests as usual. benchmark_results_v1.json and benchmark_results_v1.md will contain the results of the v1 tests (same tests, but using v1). The results have been verified with running v0 and v1 separately.

This approach ensures that the performance dashboard remains unaffected. As noted in simon-mo/vllm-community-dashboard, the dashboard processes the benchmark_results.json file. This file remains unchanged for v0, and a new file, benchmark_results_v1.json, will be generated. After merging this PR, we can easily update the repository to support visualizing v1 on the dashboard as well.

Feb 07 '25 21:02 Shaoting-Feng

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Feb 07 '25 21:02 github-actions[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Shaoting-Feng.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Feb 07 '25 21:02 mergify[bot]

@ywang96 Thank you for assigning yourself to the review! After checking the Buildkite performance-benchmark pipeline, I noticed that it is stuck at the Wait for container to be ready step, and the error message is: Waiting for image to be available....

Feb 18 '25 21:02 Shaoting-Feng

@ywang96 Thank you for assigning yourself to the review! After checking the Buildkite performance-benchmark pipeline, I noticed that it is stuck at the Wait for container to be ready step, and the error message is: Waiting for image to be available....

I'll take a look later today!

Feb 18 '25 22:02 ywang96

@ywang96 Thank you for assigning yourself to the review! After checking the Buildkite performance-benchmark pipeline, I noticed that it is stuck at the Wait for container to be ready step, and the error message is: Waiting for image to be available....

I'll take a look later today!

Thanks!!

Feb 18 '25 22:02 Shaoting-Feng

I just merged a fix for the perf benchmark. Can you merge this branch with main and try again?

Feb 19 '25 08:02 khluu

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Shaoting-Feng.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

May 13 '25 13:05 mergify[bot]