[Feature]: Audit and Update Examples To Use `VLLM_USE_V1=1`
🚀 The feature, motivation and pitch
Many of the examples leverage V0 internals.
We should:
- raise
NotImplementedErrorifenvs.VLLM_USE_V1with these - convert them to use
V1if we can
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Hi. May I look into this issue?
@robertgshaw2-redhat I would like to contribute to this issue. Would you mind to elaborate the requirements in detail? I'll start to go through the examples and start testing them under VLLM_USE_V1=1, then update or mark as not supported depending on feasibility. Let me know if that's okay.
@robertgshaw2-redhat @njhill I’m working on this issue. At first, I planned to write a script to run all the examples with v1, but some examples require complex setups, so now I’m manually checking the implementations to see if they are based on v0 or v1.
Could you share if there are any documents or guidelines about v0 and v1 patterns or internal indicators that I should look for during this process? It would really help to know exactly what distinguishes v0 from v1 in terms of imports, APIs, or structure.
Also for some examples that are based on v0 internals and not easily convertible, should we just raise NotImplementedError if VLLM_USE_V1=1 is set, or is a full refactor preferred?
Thanks in advance for the help!
I've done auditing for the offline inference part. Working on refactoring offline inferences examples. Should I submit two separated PR for auditing and refactoring since it's changed more than 10 files? @robertgshaw2-redhat @njhill
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
I came across certain situations where V1 switchs back to V0 automatically when attention backend isn't compatible, even though LLM_USE_V1 is enabled explicitly. Is this issue still open? @robertgshaw2-redhat