vllm
vllm copied to clipboard
[Model] Add support for 'gte-Qwen2' embedding models
FIX #6015 FIX #5827 FIX #5611 FIX #5600
This should work for Alibaba-NLP/gte-Qwen2-7B-instruct and Alibaba-NLP/gte-Qwen2-1.5B-instruct
You can serve OpenAI compatible API with:
python -m vllm.entrypoints.openai.api_server \
--served-model-name gte-Qwen2-7B-instruct \
--model Alibaba-NLP/gte-Qwen2-7B-instruct \
--dtype bfloat16 \
--trust-remote-code
However, the current version has a consistency issue of embeddings, which means it can not pass the following test. It should be fixed before merging.
pytest tests/models/test_embedding.py
# FAILED tests/models/test_embedding.py::test_models[half-Alibaba-NLP/gte-Qwen2-7B-instruct] - AssertionError: Not all values are within 0.01 of 1.0