text-embeddings-inference icon indicating copy to clipboard operation
text-embeddings-inference copied to clipboard

Feature Request: Add Support for Qwen3-Reranker Model

Open franklucky001 opened this issue 6 months ago • 13 comments

Feature request

Description:

I would like to request support for the Qwen3-Reranker model (specifically Qwen3-Reranker-0.6B) in the text-embeddings-inference repository.

Currently, there appears to be an issue when trying to convert Qwen3-Reranker from Qwen3ForCausalLM to Qwen3ForSequenceClassification, with the error message indicating that the classifier model type is not supported for Qwen3.

Additional Context:

The Qwen3-Reranker model has been discussed on HuggingFace (reference: https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3), but proper integration with the inference server seems to require additional support.

testing with docker image ghcr.io/huggingface/text-embeddings-inference:turing-1.7.2

error traceback

rerank-qwen3 | 2025-06-17T02:12:36.220459Z INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend rerank-qwen3 | 2025-06-17T02:12:36.639564Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:463: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1))) rerank-qwen3 | 2025-06-17T02:12:36.640020Z ERROR text_embeddings_backend: backends/src/lib.rs:388: Could not start Candle backend: Could not start backend: classifier model type is not supported for Qwen3 rerank-qwen3 | Error: Could not create backend rerank-qwen3 | rerank-qwen3 | Caused by: rerank-qwen3 | Could not start backend: Could not start a suitable backend

Requested Features:

Add support for Qwen3-Reranker model architecture

Implement proper handling of the sequence classification variant

Include the model in the supported model types for reranking tasks

Use Case:

This would enable users to deploy Qwen3-Reranker as part of their embedding and retrieval pipelines using the optimized inference server.

Would you be able to provide guidance on what would be needed to implement this support? I'm happy to provide additional details or testing if needed.

Motivation

Qwen3-Reranker is a high-performance reranking model developed by Alibaba Cloud, offering a strong balance between efficiency and accuracy for retrieval-augmented generation (RAG) and semantic search tasks. Currently, text-embeddings-inference (TEI) does not support Qwen3ForSequenceClassification, making it difficult to deploy Qwen3-Reranker in optimized inference pipelines.

Supporting Qwen3-Reranker in TEI would:

Enable seamless integration with existing RAG and search systems.

Provide optimized inference (e.g., FlashAttention, dynamic batching) compared to manual deployment.

Expand TEI's coverage of popular open-weight models, aligning with the growing adoption of the Qwen series (Qwen2, Qwen1.5, etc.).

Given the increasing use of Qwen models in industry and research, adding native support for Qwen3-Reranker would significantly improve user experience and broaden TEI's applicability.

Your contribution

I'm opening this issue to request support for Qwen3-Reranker. While I don't have a concrete implementation yet, I'm happy to:

  • Provide testing on different hardware environments
  • Share benchmark results
  • Collaborate on validating any potential solutions

franklucky001 avatar Jun 17 '25 02:06 franklucky001

Looking forward to the support of Qwen3-Reranker series models!

qianjia-cn avatar Jun 19 '25 01:06 qianjia-cn

Qwen3 embedding and rerank model which base on qwen3 chat model perform pretty good in some fields, please consider this request

wrpromail avatar Jul 04 '25 05:07 wrpromail

Looking forward to the support of Qwen3-Reranker series models!

xiaobing395845388 avatar Jul 10 '25 06:07 xiaobing395845388

Looking forward to the support of Qwen3-Reranker series models!

lmolhw5252 avatar Jul 18 '25 03:07 lmolhw5252

+ 1

psc0606 avatar Jul 21 '25 03:07 psc0606

Looking forward to the support of Qwen3-Reranker series models!

dkasa avatar Jul 22 '25 03:07 dkasa

Looking forward to the support of Qwen3-Reranker series models!

scd10 avatar Jul 25 '25 09:07 scd10

Any update about this?

BrightXiaoHan avatar Jul 29 '25 02:07 BrightXiaoHan

What is required in order to properly run qwen3 rerankers with the latest TEI version? Is using --pooling last-token enough? Can anyone guide us? Thanks in advance!

bgeneto avatar Jul 29 '25 17:07 bgeneto

What is required in order to properly run qwen3 rerankers with the latest TEI version? Is using --pooling last-token enough? Can anyone guide us? Thanks in advance!

thanks for reply. use following command

docker run --gpus all -p 8080:80 -v /root/Qwen3-Reranker-0.6B:/data ghcr.io/huggingface/text-embeddings-inference:1.8.0 --model-id /data

2025-08-08T15:26:34.314552Z INFO text_embeddings_router: router/src/main.rs:202: Args { model_id: "/****", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: Some("2_Dense"), hf_api_token: None, hf_token: None, hostname: "8f210320888f", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None } Error: The --pooling arg is not set and we could not find a pooling configuration (1_Pooling/config.json) for this model.

Caused by: No such file or directory (os error 2)

then check the content of config.json file in qwen3 reranker repo: https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/blob/main/config.json it looks like config file of normal language model

then change it like

{
  "architectures": [
    "Qwen3ForSequenceClassification"
  ],
  "id2label": {
    "0": "LABEL_0"
  },
  "label2id": {
    "LABEL_0": 0
  },
... same content...

(py13) root@DESKTOP-FT1RFNR:~# docker run --gpus all -p 8080:80 -v /root/Qwen3-Reranker-0.6B:/data ghcr.io/huggingface/text-embeddings-inference:1.8.0 --model-id /data 2025-08-08T15:38:16.766271Z INFO text_embeddings_router: router/src/main.rs:202: Args { model_id: "/****", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: Some("2_Dense"), hf_api_token: None, hf_token: None, hostname: "c081edb149d5", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None } 2025-08-08T15:38:17.008357Z WARN text_embeddings_router: router/src/lib.rs:193: Could not find a Sentence Transformers config 2025-08-08T15:38:17.008389Z INFO text_embeddings_router: router/src/lib.rs:197: Maximum number of tokens per request: 40960 2025-08-08T15:38:17.008535Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 20 tokenization workers 2025-08-08T15:38:17.568725Z INFO text_embeddings_router: router/src/lib.rs:239: Starting model backend 2025-08-08T15:38:17.869218Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:466: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1))) 2025-08-08T15:38:17.869949Z ERROR text_embeddings_backend: backends/src/lib.rs:411: Could not start Candle backend: Could not start backend: classifier model type is not supported for Qwen3 Error: Could not create backend

Caused by: Could not start backend: Could not start a suitable backend

wrpromail avatar Aug 08 '25 15:08 wrpromail

just added the support in this PR. check it out please https://github.com/huggingface/text-embeddings-inference/pull/695

sigridjineth avatar Aug 10 '25 19:08 sigridjineth

convert_to_st.py

from sentence_transformers import CrossEncoder

# HF Qwen3-Reranker model
src_model = "/PATH/Qwen/Qwen3-Reranker-4B-HF"

# sentence-transformers
dst_model = "/PATH/Qwen/Qwen3-Reranker-4B"

# Loading HuggingFace model
print(f"Loading HF model from {src_model} ...")
model = CrossEncoder(src_model)

# Saving as sentence-transformers
print(f"Saving as sentence-transformers CrossEncoder to {dst_model} ...")
model.save(dst_model)

print("✅ Done! You can now mount this folder to TEI and call /rerank")

dkasa avatar Aug 11 '25 02:08 dkasa

convert_to_st.py

from sentence_transformers import CrossEncoder

# HF Qwen3-Reranker model
src_model = "/PATH/Qwen/Qwen3-Reranker-4B-HF"

# sentence-transformers
dst_model = "/PATH/Qwen/Qwen3-Reranker-4B"

# Loading HuggingFace model
print(f"Loading HF model from {src_model} ...")
model = CrossEncoder(src_model)

# Saving as sentence-transformers
print(f"Saving as sentence-transformers CrossEncoder to {dst_model} ...")
model.save(dst_model)

print("✅ Done! You can now mount this folder to TEI and call /rerank")

Does this solution work for anyone?

forrestbao avatar Oct 21 '25 07:10 forrestbao