vllm [Model] Add user-configurable task for models that support both generation and embedding

[Model] Add user-configurable task for models that support both generation and embedding

Open DarkLight1337 opened this issue 4 months ago • 6 comments

Follow-up to #9303

This PR adds a --task option which is used to determine which model runner (for generation or embedding) to create when initializing vLLM. The default (auto) will select the first task which the model supports. In the case of embedding models which share the same model architecture as the base model, users can explicitly specify which task to initialize vLLM for, e.g.:

# Both models use Phi3VForCausalLM
vllm serve microsoft/Phi-3.5-vision-instruct --task generate <extra_args>
vllm serve TIGER-Lab/VLM2Vec-Full --task embedding <extra_args>

(Also addresses https://github.com/vllm-project/vllm/pull/6282#discussion_r1672882380.)

Since the new task option is semantically related to model argument, I've placed it right after model, before tokenizer. To avoid incompatibilities resulting from this, I have added backwards compatibility and deprecated the usage of positional arguments apart from model in LLM.__init__.

Note: This will introduce a breaking change for VLM2Vec users as they currently do not have to pass --task embedding due to the hardcoded embedding model detection logic. Nevertheless, requiring the user to set the mode explicitly is more maintainable in the long run as the number of embedding models increases.

Oct 16 '24 14:10 DarkLight1337

vllm vllm copied to clipboard

[Model] Add user-configurable task for models that support both generation and embedding

vllm
vllm copied to clipboard