vllm [Model] Add support for 'gte-Qwen2' embedding models

[Model] Add support for 'gte-Qwen2' embedding models

Open 0xWelt opened this issue 7 months ago • 0 comments

FIX #6015 FIX #5827 FIX #5611 FIX #5600

This should work for Alibaba-NLP/gte-Qwen2-7B-instruct and Alibaba-NLP/gte-Qwen2-1.5B-instruct

You can serve OpenAI compatible API with:

python -m vllm.entrypoints.openai.api_server \
  --served-model-name gte-Qwen2-7B-instruct \
  --model Alibaba-NLP/gte-Qwen2-7B-instruct \
  --dtype bfloat16 \
  --trust-remote-code

However, the current version has a consistency issue of embeddings, which means it can not pass the following test. It should be fixed before merging.

pytest tests/models/test_embedding.py

# FAILED tests/models/test_embedding.py::test_models[half-Alibaba-NLP/gte-Qwen2-7B-instruct] - AssertionError: Not all values are within 0.01 of 1.0

Jul 10 '24 03:07 0xWelt

vllm vllm copied to clipboard

[Model] Add support for 'gte-Qwen2' embedding models

vllm
vllm copied to clipboard