vllm
vllm copied to clipboard
[Model] Add user-configurable task for models that support both generation and embedding
Follow-up to #9303
This PR adds a --task
option which is used to determine which model runner (for generation or embedding) to create when initializing vLLM. The default (auto
) will select the first task which the model supports. In the case of embedding models which share the same model architecture as the base model, users can explicitly specify which task to initialize vLLM for, e.g.:
# Both models use Phi3VForCausalLM
vllm serve microsoft/Phi-3.5-vision-instruct --task generate <extra_args>
vllm serve TIGER-Lab/VLM2Vec-Full --task embedding <extra_args>
(Also addresses https://github.com/vllm-project/vllm/pull/6282#discussion_r1672882380.)
Since the new task
option is semantically related to model
argument, I've placed it right after model
, before tokenizer
. To avoid incompatibilities resulting from this, I have added backwards compatibility and deprecated the usage of positional arguments apart from model
in LLM.__init__
.
Note: This will introduce a breaking change for VLM2Vec users as they currently do not have to pass --task embedding
due to the hardcoded embedding model detection logic. Nevertheless, requiring the user to set the mode explicitly is more maintainable in the long run as the number of embedding models increases.