vllm [Bugfix] Fix LongRoPE bug

This PR addresses this issue by using the number of original input tokens to decide using short or long factors to avoid the switch happening on one generation request: if the number of original input tokens < original_max_position_embeddings, it uses short factors; other than that, it uses long factors.

FIX #6438

Models tested before this PR:

microsoft/Phi-3.5-mini-instruct
microsoft/Phi-3-mini-4k-instruct
microsoft/Phi-3-small-8k-instruct
microsoft/Phi-3-small-128k-instruct
microsoft/Phi-3.5-MoE-instruct
OpenVINO/Phi-3-mini-128k-instruct-int4-ov

Backends tested:

XFORMERS
FLASHINFER
FLASH_ATTN
OPENVINO
ROCM_FLASH

PR Checklist (Click to Expand)

Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Model] for adding a new model or improving an existing model. Model name should appear in the title.
[Frontend] For changes on the vLLM frontend (e.g., OpenAI API server, LLM class, etc.)
[Kernel] for changes affecting CUDA kernels or other compute kernels.
[Core] for changes in the core vLLM logic (e.g., LLMEngine, AsyncLLMEngine, Scheduler, etc.)
[Hardware][Vendor] for hardware-specific changes. Vendor name should appear in the prefix (e.g., [Hardware][AMD]).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

We adhere to Google Python style guide and Google C++ style guide.
Pass all linter checks. Please use format.sh to format your code.
The code need to be well-documented to ensure future contributors can easily understand the code.
Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.
Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

What to Expect for the Reviews

The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:

After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.
After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.
After the review, the reviewer will put an action-required label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.
Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!

@robertgshaw2-neuralmagic @mgoin

Sep 07 '24 00:09 garg-amit

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Sep 07 '24 00:09 github-actions[bot]

@garg-amit is this PR ready for review?

Sep 08 '24 12:09 robertgshaw2-redhat

@robertgshaw2-neuralmagic yes, the PR is ready for review now

Sep 09 '24 20:09 garg-amit

@garg-amit what would need to happen to trigger this feature? Is there anything inside LM Eval harness that we could use to trigger and confirm this is working correctly in our CI

Sep 09 '24 20:09 robertgshaw2-redhat

@robertgshaw2-neuralmagic This fix addresses the scenario where the input length is shorter than the original_max_position_embeddings, but the combined input and output exceed the original_max_position_embeddings. I’m not sure which evals might trigger this case, but to check parity with other scenarios: for short-context evals, MMLU and GSM8k, and for long-context evals, Qasper can be triggered.

Sep 10 '24 05:09 garg-amit

@robertgshaw2-neuralmagic we performed some local testing to trigger the above scenario with Phi-3.5-mini-instruct. Here are the outputs before and after the fix.

Prompt:

Microsoft Corporation is an American multinational corporation and technology company headquartered in Redmond, Washington.[2] Its best-known software products are the Windows line of operating systems, the Microsoft 365 suite of productivity applications, the Azure cloud computing platform and the Edge web browser. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface lineup of touchscreen personal computers. Microsoft ranked No. 14 in the 2022 Fortune 500 rankings of the largest United States corporations by total revenue;[3] and it was the world's largest software maker by revenue in 2022 according to Forbes Global 2000. It is considered one of the Big Five American information technology companies, alongside Alphabet (parent company of Google), Amazon, Apple, and Meta (parent company of Facebook).
...
[62] The European Union imposed another fine of €899 million ($1.4 billion) for Microsoft's lack of compliance with the March 2004 judgment on February 27, 2008, saying that the company charged rivals unreasonable prices for key information about its workgroup and backoffice servers.[63] Microsoft stated that it was in compliance and that "these fines are about the past issues that have been resolved".[64] 2007 also saw the creation of a multi-core unit at Microsoft, following the steps of server companies such as Sun and IBM.

Gates retired

Prompt length: 4093

Output before the fix:

 as chief software architectingessis.
as.




00orx








0.0omist.0aringing, with.
.
.





...


of0X.

game,


of000000000000s.0000000000000000000.
0X


s
0000X00XX



X0s of
of
of0orofof.0000xxon, of000xx.,
000


time,00o
XXXor0s.
0000XXXXX00

sXor.or.0s000s.
of.0on.
of.





000s
000oXNT.

00



such.
onon.

XorXXs00

After the fix:

 as chief software architect in June 2006, and as chairman of the board in February 2008.[65] In 2008, Microsoft released Windows Phone 7, a new mobile operating system for smartphones, and the Xbox 360 S, a revised version of the Xbox 360.[66][67] The company also released the Zune HD, a digital media player with video playback, and the Surface, a tablet computer.[68]

In 2009, Microsoft released Windows 7, a new version of Windows that was generally well received.[69] The company also released the Xbox 360 Wireless Gaming Receiver, a wireless adapter for the Xbox 360.[70]

In 2010, Microsoft released the Kinect, a motion-sensing input device for the Xbox 360 and Xbox One.[71] The company also released the Surface 2, a tablet computer with a new design and improved hardware.[72]

In 2011, Microsoft released Windows 8, a new version of Windows that was generally well received.[73] The company also released the Xbox 360 E, a revised version of the Xbox 360.[74]
...
Microsoft released

Sep 11 '24 16:09 garg-amit

@robertgshaw2-neuralmagic can I please get a review?

Sep 17 '24 05:09 garg-amit

@robertgshaw2-neuralmagic can I please get a review?

Yes. Will get this in for v0.6.2

Sep 17 '24 12:09 robertgshaw2-redhat

@robertgshaw2-neuralmagic can I please get a review?

Yes. Will get this in for v0.6.2

@garg-amit - I was waiting for tests to be green to merge. Are you planning to address them?

Sep 17 '24 12:09 robertgshaw2-redhat

@robertgshaw2-neuralmagic I’ve fixed most of the test cases. The remaining test cases seem unrelated to the current PR. Please let me know if there is anything else that needs to be fixed.

Sep 18 '24 07:09 garg-amit

@robertgshaw2-neuralmagic I’ve fixed most of the test cases. The remaining test cases seem unrelated to the current PR. Please let me know if there is anything else that needs to be fixed.

Thanks. Will review and merge tomorrow

Sep 20 '24 02:09 robertgshaw2-redhat

AMD CI has a broken node. Okay to ignore the issues

Sep 21 '24 21:09 robertgshaw2-redhat

@robertgshaw2-neuralmagic The PR is ready for another look.

Sep 25 '24 16:09 garg-amit

@robertgshaw2-neuralmagic Please let me know if there is anything else that needs to be addressed.

Oct 01 '24 05:10 garg-amit

Was OOO last week. My apologies. Will review today

Oct 01 '24 09:10 robertgshaw2-redhat

@robertgshaw2-neuralmagic ping

Oct 06 '24 20:10 garg-amit

@zhuohan123 @youkaichao @WoosukKwon could you help review this PR?

Oct 15 '24 16:10 garg-amit

Im so sorry @garg-amit - this slipped off my radar. Will give final review tonight

Oct 15 '24 16:10 robertgshaw2-redhat

LGTM! Getting the CI green so I pushed to your branch. Sorry again about the delay

Oct 17 '24 16:10 robertgshaw2-redhat

No worries! :) Thank you for reviewing, @robertgshaw2-neuralmagic!

Oct 17 '24 17:10 garg-amit

hey @garg-amit it looks like this is causing legitimate issues in the CI

[2024-10-17T19:15:39Z]         # Attention metadata.
[2024-10-17T19:15:39Z] >       attn_metadata = self.attn_metadata_builder.build(
[2024-10-17T19:15:39Z]             seq_lens, query_lens, num_orig_input_tokens_list,
[2024-10-17T19:15:39Z]             cuda_graph_pad_size, batch_size)
[2024-10-17T19:15:39Z] E       TypeError: PlaceholderAttentionMetadataBuilder.build() takes 5 positional arguments but 6 were given

Oct 18 '24 15:10 robertgshaw2-redhat

@robertgshaw2-neuralmagic I’ve fixed the above test case. However, the remaining test cases appear to be unrelated to the current PR. I tried to resolve them, but I couldn’t. Please let me know if those need to be resolved. I would appreciate your help.

Oct 28 '24 06:10 garg-amit

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @garg-amit.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Nov 06 '24 19:11 mergify[bot]

@robertgshaw2-neuralmagic when are you planning to merge this PR?

Nov 10 '24 23:11 garg-amit

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @garg-amit.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Nov 19 '24 03:11 mergify[bot]

@garg-amit @robertgshaw2-neuralmagic Is there an update on this PR?

Nov 27 '24 16:11 DarshanDeshpande

Awaiting Robert's response. @robertgshaw2-neuralmagic, could you please confirm if DCO is a hard requirement? While rebasing, I'm being prompted to sign off on commits that I didn't author. If required, I can create a new PR with the same changes.

Nov 27 '24 17:11 garg-amit

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @garg-amit.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Nov 30 '24 07:11 mergify[bot]

Awaiting Robert's response. @robertgshaw2-neuralmagic, could you please confirm if DCO is a hard requirement? While rebasing, I'm being prompted to sign off on commits that I didn't author. If required, I can create a new PR with the same changes.

DCO can be overidden. Dont worry about rebasing.

Dec 13 '24 22:12 robertgshaw2-redhat

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @garg-amit.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Jan 31 '25 05:01 mergify[bot]

vllm vllm copied to clipboard

[Bugfix] Fix LongRoPE bug

PR Title and Classification

Code Quality

Notes for Large Changes

What to Expect for the Reviews

Thank You

vllm
vllm copied to clipboard