vllm
vllm copied to clipboard
[Feature]: Return hidden states (in progress?)
🚀 The feature, motivation and pitch
I know this feature request sort of already exists: https://github.com/vllm-project/vllm/issues/5950 (and older, semi related requests) https://github.com/vllm-project/vllm/issues/3594 https://github.com/vllm-project/vllm/issues/1857
This is a similar pitch but I am creating a new issue as I noticed newer developments in the codebase. The pitch is to support returning hidden states when generating sequences. This enables many potential behaviors such as output classification, guardrails, etc. Whereas #5950 suggested a different step for embedding, I would suggest building it in as an option to EngineArgs or as an option that can be passed in with each generation request.
I see that in v0.5.1
there is already some new code in ModelDriverBase
to support return_hidden_states
. However, I don't see that supported yet in the LLM engine yet (not an input to EngineArgs
). Basically, it seems like this feature is under development. I am mainly wondering what the timeline is for that? And what is the approach being taken so that I and the community can develop accordingly?
Alternatives
No response
Additional context
No response