vllm
vllm copied to clipboard
[Model] Add Support for Grok2
Purpose
To address https://github.com/vllm-project/vllm/issues/23557
Test Plan
Test Result
This is a draft PR since the work is still in progress and the implementation currently produces incorrect results.
-
Tokenizer support:
tokenizer.tok.jsonis currently not supported. As a workaround, you can use the Hugging Face–compatible tokenizer available here: -
FlashAttention issue: I encountered the following error when testing:
RuntimeError: Worker failed with error 'This flash attention build does not support tanh softcapping.'
To work around this, I used flashinfer. I’m not sure if this error is specific to my environment.
- Response correctness: The generated responses are still incorrect, I haven’t had the chance to fully debug this yet.
@igor-susic1 @BranZhai @Crucifixion-Fxl — It would be great if you could take a look and help out whenever you get the chance. Thanks so much! 🙏
Essential Elements of an Effective PR Description Checklist
- [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
- [ ] The test plan, such as providing test command.
- [ ] The test results, such as pasting the results comparison before and after, or e2e results
- [ ] (Optional) The necessary documentation update, such as updating
supported_models.mdandexamplesfor a new model. - [ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @wenchen76.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @wenchen76.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Documentation preview: https://vllm--24286.org.readthedocs.build/en/24286/
Hey there, I'm curious on the status of this PR/if this description is still accurate?