continue icon indicating copy to clipboard operation
continue copied to clipboard

Huggingface-TEI reranker not showing up as option

Open rafaol opened this issue 6 months ago • 4 comments

Before submitting your bug report

Relevant environment info

- OS: MacOS Sequoia 15.4.1
- Hardware: MacBook Pro with Apple M2
- Continue version: 1.0.6
- IDE version: VSCode 1.99.3
- Model:
- config:
  
name: Local Assistant
version: 1.0.0
schema: v1
models:
  - name: Autodetect
    provider: ollama
    model: AUTODETECT
  - name: Qwen2.5 1.5b Autocomplete
    provider: ollama
    model: qwen2.5-coder:1.5b
    roles:
      - autocomplete
  - name: Nomic Text Embed
    provider: ollama
    model: nomic-embed-text
    roles:
      - embed
  - name: MXBAI Embed
    provider: ollama
    model: mxbai-embed-large
    roles:
      - embed
  - name: TEI Reranker
    provider: huggingface-tei
    apiBase: http://localhost:8088
    model: BAAI/bge-reranker-v2-m3
    roles:
      - rerank
context:
  - provider: code
  - provider: docs
  - provider: diff
  - provider: terminal
  - provider: problems
  - provider: folder
  - provider: codebase
    params:
      nRetrieve: 32
      nFinal: 16
      useReranking: true
  - provider: web
  - provider: url
  - provider: repo-map
    params:
      includeSignatures: false

Description

I have been trying to use a local reranker via Huggingface's Text Embeddings Inference (TEI). TEI was locally installed via cargo with support for Apple's Metal. It is running OK, and I can check the output, as shown below.

% text-embeddings-router --model-id "BAAI/bge-reranker-v2-m3" --port 8088
2025-04-23T03:07:23.164769Z  INFO text_embeddings_router: router/src/main.rs:185: Args { model_id: "BAA*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "0.0.0.0", port: 8088, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-04-23T03:07:23.167464Z  INFO hf_hub: /Users/[omitted]/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.4.2/src/lib.rs:72: Using token file found "/Users/[omitted]/.cache/huggingface/token"    
2025-04-23T03:07:23.170008Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-04-23T03:07:23.170017Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-04-23T03:07:23.490620Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/1_Pooling/config.json)
2025-04-23T03:07:26.176065Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-04-23T03:07:26.589347Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/config_sentence_transformers.json)
2025-04-23T03:07:26.589382Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-04-23T03:07:26.589636Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-04-23T03:07:26.589728Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 3.4197675s
2025-04-23T03:07:26.815045Z  WARN text_embeddings_router: router/src/lib.rs:188: Could not find a Sentence Transformers config
2025-04-23T03:07:26.815063Z  INFO text_embeddings_router: router/src/lib.rs:192: Maximum number of tokens per request: 8192
2025-04-23T03:07:26.815078Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
2025-04-23T03:07:27.997825Z  INFO text_embeddings_router: router/src/lib.rs:234: Starting model backend
2025-04-23T03:07:27.997841Z  INFO text_embeddings_backend: backends/src/lib.rs:493: Downloading `model.safetensors`
2025-04-23T03:07:27.998005Z  INFO text_embeddings_backend: backends/src/lib.rs:377: Model weights downloaded in 163.542µs
2025-04-23T03:07:28.008655Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:249: Starting Bert model on Metal(MetalDevice(DeviceId(1)))
2025-04-23T03:07:31.483289Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1795: Starting HTTP server: 0.0.0.0:8088
2025-04-23T03:07:31.483304Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1796: Ready

After running:

curl -X POST http://localhost:8088/rerank  -H "Content-Type: application/json"  -d '{"query": "What is Python?", "texts": ["Python is a programming language.", "Java is a programming language."]}'

I get the expected response:

[{"index":0,"score":0.99958915},{"index":1,"score":0.002157342}]

The model is added with the rerank role in my config.yaml for Continue as shown above. No config errors are shown after saving the file. However, it still won't show as an available option for reranking in the models configuration tab.

Image

To reproduce

  1. Install Continue extension for VSCode on MacOS Sequoia
  2. Install TEI locally with Metal support
  3. Run text-embeddings-router --model-id "BAAI/bge-reranker-v2-m3" --port 8088 (or any other bge reranker)
  4. Add reranker to local assistant's config
  5. Try to select rerank model in Continue's models tab (over the chat box)

rafaol avatar Apr 23 '25 03:04 rafaol