vllm
vllm copied to clipboard
[Perf] Optimize MRotaryEmbedding::get_input_positions performance by numba
Blocker
#16878
Qwen2/2.5-VL Benchmark
| input length | multi-modal data | main branch | this PR |
|---|---|---|---|
| 374 tokens | 1 (num images) x 324 (per image) image tokens | 0.305ms | 0.012ms |
| 3308 tokens | 10 (num images) x 324 (per image) image tokens | 1.994ms | 0.018ms |
| 5379 tokens | 1 (num images) x 5329 (per image) image tokens | 0.944ms | 0.016ms |
| 3290 tokens | 1 (num videos) x 10 (num temporal) x 324 (per frame) video tokens | 0.666ms | 0.016ms |
| 32450 tokens | 1 (num videos) x 100 (num temporal) x 324 (per frame) video tokens | 4.050ms | 0.063ms |
Qwen2.5-Omni Benchmark
| input length | multi-modal data | main branch | this PR |
|---|---|---|---|
| 374 tokens | 1 (num images) x 324 (per image) image tokens | 1.272ms | 0.018ms |
| 3308 tokens | 10 (num images) x 324 (per image) image tokens | 3.527ms | 0.024ms |
| 5379 tokens | 1 (num images) x 5329 (per image) image tokens | 1.367ms | 0.025ms |
| 3290 tokens | 1 (num videos) x 10 (num temporal) x 324 (per frame) video tokens | 1.329ms | 0.022ms |
| 32450 tokens | 1 (num videos) x 100 (num temporal) x 324 (per frame) video tokens | 1.797ms | 0.072ms |
| 3542 tokens | 1 (num videos) x 10 (num temporal) x 324 (per frame) video tokens + 250 audio tokens |
16.664ms | 0.022ms |
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @imkero.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
@imkero Thanks for the PR! This is amazing 🚀 Could you please resolve the merge conflict and the lint error?
@imkero Thanks for the PR! This is amazing 🚀 Could you please resolve the merge conflict and the lint error?
Sure! I will update this PR soon.
TODO:
- [x] fix merge conflicts & lint error
- [x] complete the PR description
- [x] enhance the error msg in assertions
- [x] add unit tests for assertions
- [x] refresh the benchmark result
- [x] e2e correctness test
And I will share my experience of using numba later.
@imkero Just so you know: To fix the CI failure, we should move numba from requirements/cuda.txt and requirements/rocm.txt to requirements/common.txt.
@imkero Just so you know: To fix the CI failure, we should move
numbafromrequirements/cuda.txtandrequirements/rocm.txttorequirements/common.txt.
Thanks for your remind, I have moved it to requirements/common.txt now.
@WoosukKwon I think this PR is ready for review now
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @imkero.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
I will take get_next_positions_tensor into consideration because they are reported to be time-costing as well in #17617
I have written an optimized version of get_next_input_positions_tensor which is zero copy and no lru cache.
Will add e2e benchmarking result like #17617
This PR continues the idea of #17617. Thanks @vadiklyutiy
Could you please take a look? @ywang96
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @imkero.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @imkero.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork