vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Model] Add Support for Grok2

Open wenchen76 opened this issue 3 months ago • 4 comments

Purpose

To address https://github.com/vllm-project/vllm/issues/23557

Test Plan

Test Result

This is a draft PR since the work is still in progress and the implementation currently produces incorrect results.

  • Tokenizer support: tokenizer.tok.json is currently not supported. As a workaround, you can use the Hugging Face–compatible tokenizer available here:

  • FlashAttention issue: I encountered the following error when testing:

RuntimeError: Worker failed with error 'This flash attention build does not support tanh softcapping.'

To work around this, I used flashinfer. I’m not sure if this error is specific to my environment.

  • Response correctness: The generated responses are still incorrect, I haven’t had the chance to fully debug this yet.

@igor-susic1 @BranZhai @Crucifixion-Fxl — It would be great if you could take a look and help out whenever you get the chance. Thanks so much! 🙏


Essential Elements of an Effective PR Description Checklist
  • [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • [ ] The test plan, such as providing test command.
  • [ ] The test results, such as pasting the results comparison before and after, or e2e results
  • [ ] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • [ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

wenchen76 avatar Sep 05 '25 00:09 wenchen76

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @wenchen76.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Sep 05 '25 00:09 mergify[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @wenchen76.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Sep 09 '25 04:09 mergify[bot]

Documentation preview: https://vllm--24286.org.readthedocs.build/en/24286/

mergify[bot] avatar Oct 08 '25 14:10 mergify[bot]

Hey there, I'm curious on the status of this PR/if this description is still accurate?

reaganjlee avatar Nov 14 '25 23:11 reaganjlee