continue
continue copied to clipboard
Huggingface-TEI reranker not showing up as option
Before submitting your bug report
- [x] I believe this is a bug. I'll try to join the Continue Discord for questions
- [x] I'm not able to find an open issue that reports the same bug
- [x] I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS: MacOS Sequoia 15.4.1
- Hardware: MacBook Pro with Apple M2
- Continue version: 1.0.6
- IDE version: VSCode 1.99.3
- Model:
- config:
name: Local Assistant
version: 1.0.0
schema: v1
models:
- name: Autodetect
provider: ollama
model: AUTODETECT
- name: Qwen2.5 1.5b Autocomplete
provider: ollama
model: qwen2.5-coder:1.5b
roles:
- autocomplete
- name: Nomic Text Embed
provider: ollama
model: nomic-embed-text
roles:
- embed
- name: MXBAI Embed
provider: ollama
model: mxbai-embed-large
roles:
- embed
- name: TEI Reranker
provider: huggingface-tei
apiBase: http://localhost:8088
model: BAAI/bge-reranker-v2-m3
roles:
- rerank
context:
- provider: code
- provider: docs
- provider: diff
- provider: terminal
- provider: problems
- provider: folder
- provider: codebase
params:
nRetrieve: 32
nFinal: 16
useReranking: true
- provider: web
- provider: url
- provider: repo-map
params:
includeSignatures: false
Description
I have been trying to use a local reranker via Huggingface's Text Embeddings Inference (TEI). TEI was locally installed via cargo with support for Apple's Metal. It is running OK, and I can check the output, as shown below.
% text-embeddings-router --model-id "BAAI/bge-reranker-v2-m3" --port 8088
2025-04-23T03:07:23.164769Z INFO text_embeddings_router: router/src/main.rs:185: Args { model_id: "BAA*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "0.0.0.0", port: 8088, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-04-23T03:07:23.167464Z INFO hf_hub: /Users/[omitted]/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.4.2/src/lib.rs:72: Using token file found "/Users/[omitted]/.cache/huggingface/token"
2025-04-23T03:07:23.170008Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-04-23T03:07:23.170017Z INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-04-23T03:07:23.490620Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/1_Pooling/config.json)
2025-04-23T03:07:26.176065Z INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-04-23T03:07:26.589347Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/config_sentence_transformers.json)
2025-04-23T03:07:26.589382Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-04-23T03:07:26.589636Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-04-23T03:07:26.589728Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 3.4197675s
2025-04-23T03:07:26.815045Z WARN text_embeddings_router: router/src/lib.rs:188: Could not find a Sentence Transformers config
2025-04-23T03:07:26.815063Z INFO text_embeddings_router: router/src/lib.rs:192: Maximum number of tokens per request: 8192
2025-04-23T03:07:26.815078Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
2025-04-23T03:07:27.997825Z INFO text_embeddings_router: router/src/lib.rs:234: Starting model backend
2025-04-23T03:07:27.997841Z INFO text_embeddings_backend: backends/src/lib.rs:493: Downloading `model.safetensors`
2025-04-23T03:07:27.998005Z INFO text_embeddings_backend: backends/src/lib.rs:377: Model weights downloaded in 163.542µs
2025-04-23T03:07:28.008655Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:249: Starting Bert model on Metal(MetalDevice(DeviceId(1)))
2025-04-23T03:07:31.483289Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1795: Starting HTTP server: 0.0.0.0:8088
2025-04-23T03:07:31.483304Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1796: Ready
After running:
curl -X POST http://localhost:8088/rerank -H "Content-Type: application/json" -d '{"query": "What is Python?", "texts": ["Python is a programming language.", "Java is a programming language."]}'
I get the expected response:
[{"index":0,"score":0.99958915},{"index":1,"score":0.002157342}]
The model is added with the rerank role in my config.yaml for Continue as shown above. No config errors are shown after saving the file. However, it still won't show as an available option for reranking in the models configuration tab.
To reproduce
- Install Continue extension for VSCode on MacOS Sequoia
- Install TEI locally with Metal support
- Run
text-embeddings-router --model-id "BAAI/bge-reranker-v2-m3" --port 8088(or any other bge reranker) - Add reranker to local assistant's config
- Try to select rerank model in Continue's models tab (over the chat box)