Feature request

Description:

I would like to request support for the Qwen3-Reranker model (specifically Qwen3-Reranker-0.6B) in the text-embeddings-inference repository.

Currently, there appears to be an issue when trying to convert Qwen3-Reranker from Qwen3ForCausalLM to Qwen3ForSequenceClassification, with the error message indicating that the classifier model type is not supported for Qwen3.

Additional Context:

The Qwen3-Reranker model has been discussed on HuggingFace (reference: https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3), but proper integration with the inference server seems to require additional support.

testing with docker image `ghcr.io/huggingface/text-embeddings-inference:turing-1.7.2`

error traceback

rerank-qwen3 | 2025-06-17T02:12:36.220459Z INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend rerank-qwen3 | 2025-06-17T02:12:36.639564Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:463: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1))) rerank-qwen3 | 2025-06-17T02:12:36.640020Z ERROR text_embeddings_backend: backends/src/lib.rs:388: Could not start Candle backend: Could not start backend: classifier model type is not supported for Qwen3 rerank-qwen3 | Error: Could not create backend rerank-qwen3 | rerank-qwen3 | Caused by: rerank-qwen3 | Could not start backend: Could not start a suitable backend

Requested Features:

Add support for Qwen3-Reranker model architecture

Implement proper handling of the sequence classification variant

Include the model in the supported model types for reranking tasks

Use Case:

This would enable users to deploy Qwen3-Reranker as part of their embedding and retrieval pipelines using the optimized inference server.

Would you be able to provide guidance on what would be needed to implement this support? I'm happy to provide additional details or testing if needed.

Motivation

Qwen3-Reranker is a high-performance reranking model developed by Alibaba Cloud, offering a strong balance between efficiency and accuracy for retrieval-augmented generation (RAG) and semantic search tasks. Currently, text-embeddings-inference (TEI) does not support Qwen3ForSequenceClassification, making it difficult to deploy Qwen3-Reranker in optimized inference pipelines.

Supporting Qwen3-Reranker in TEI would:

Enable seamless integration with existing RAG and search systems.

Provide optimized inference (e.g., FlashAttention, dynamic batching) compared to manual deployment.

Expand TEI's coverage of popular open-weight models, aligning with the growing adoption of the Qwen series (Qwen2, Qwen1.5, etc.).

Given the increasing use of Qwen models in industry and research, adding native support for Qwen3-Reranker would significantly improve user experience and broaden TEI's applicability.

Your contribution

I'm opening this issue to request support for Qwen3-Reranker. While I don't have a concrete implementation yet, I'm happy to:

Provide testing on different hardware environments
Share benchmark results
Collaborate on validating any potential solutions