vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Feature]: Audit and Update Examples To Use `VLLM_USE_V1=1`

Open robertgshaw2-redhat opened this issue 9 months ago • 2 comments

🚀 The feature, motivation and pitch

Many of the examples leverage V0 internals.

We should:

  • raise NotImplementedError if envs.VLLM_USE_V1 with these
  • convert them to use V1 if we can

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

robertgshaw2-redhat avatar Mar 10 '25 02:03 robertgshaw2-redhat

Hi. May I look into this issue?

devesh-2002 avatar Mar 10 '25 03:03 devesh-2002

@robertgshaw2-redhat I would like to contribute to this issue. Would you mind to elaborate the requirements in detail? I'll start to go through the examples and start testing them under VLLM_USE_V1=1, then update or mark as not supported depending on feasibility. Let me know if that's okay.

leoli1208 avatar May 09 '25 23:05 leoli1208

@robertgshaw2-redhat @njhill I’m working on this issue. At first, I planned to write a script to run all the examples with v1, but some examples require complex setups, so now I’m manually checking the implementations to see if they are based on v0 or v1.

Could you share if there are any documents or guidelines about v0 and v1 patterns or internal indicators that I should look for during this process? It would really help to know exactly what distinguishes v0 from v1 in terms of imports, APIs, or structure.

Also for some examples that are based on v0 internals and not easily convertible, should we just raise NotImplementedError if VLLM_USE_V1=1 is set, or is a full refactor preferred?

Thanks in advance for the help!

leoli1208 avatar Jun 06 '25 05:06 leoli1208

I've done auditing for the offline inference part. Working on refactoring offline inferences examples. Should I submit two separated PR for auditing and refactoring since it's changed more than 10 files? @robertgshaw2-redhat @njhill

leoli1208 avatar Aug 05 '25 00:08 leoli1208

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] avatar Nov 03 '25 02:11 github-actions[bot]

I came across certain situations where V1 switchs back to V0 automatically when attention backend isn't compatible, even though LLM_USE_V1 is enabled explicitly. Is this issue still open? @robertgshaw2-redhat

ijpq avatar Nov 17 '25 04:11 ijpq