vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Doc] Consistent naming of attention backends

Open tdoublep opened this issue 1 year ago • 1 comments

Right now if you try to enable an unsupported feature (e.g., multi-step with xformers) you get a message like:

ValueError: Multi-Step not supported for attention backend: xformers. Set VLLM_ATTENTION_BACKEND to a value from ['flash-attn', 'rocm-flash-attn', 'flashinfer']

I find this confusing because the names given do not match how the environment variable needs to be set in order to enable the corresponding feature (e.g. rocm-flash-attn vs. ROCM_FLASH). The actual names are all-caps and defined by this enum. Those values do not match the string "names" defined in each attention backend class.

This PR fixes this, so the user will be suggested a list of strings that will actually work.

tdoublep avatar Oct 18 '24 10:10 tdoublep

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

github-actions[bot] avatar Oct 18 '24 10:10 github-actions[bot]

Please take a look at the CI failure.

DarkLight1337 avatar Oct 21 '24 02:10 DarkLight1337

@DarkLight1337 CI issues are fixed now.

tdoublep avatar Oct 21 '24 14:10 tdoublep