vllm Experimental attention backend in helion

Experimental attention backend in helion

Open bringlein opened this issue 1 month ago • 2 comments

very experimental and draft PR so far

VLLM_ATTENTION_BAKCEND=EXPERIMENTAL_HELION_ATTN vllm serve meta-llama/Llama-3.1-8B-Instruct

t.b.a.

Essential Elements of an Effective PR Description Checklist

[ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ ] The test plan, such as providing test command.
[ ] The test results, such as pasting the results comparison before and after, or e2e results
[ ] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
[ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Oct 21 '25 20:10 bringlein

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @bringlein.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Nov 11 '25 17:11 mergify[bot]

Documentation preview: https://vllm--27293.org.readthedocs.build/en/27293/

Nov 14 '25 18:11 mergify[bot]

Documentation preview: https://vllm--27293.org.readthedocs.build/en/27293/

Nov 26 '25 12:11 mergify[bot]