text-embeddings-inference icon indicating copy to clipboard operation
text-embeddings-inference copied to clipboard

Input validation error: `inputs` must have less than 512 tokens. Given: 534

Open gctian opened this issue 1 year ago • 9 comments

System Info

  • text-embeddings-router 1.1.0

  • python3.10

  • centos

  • A800

Information

  • [ ] Docker
  • [X] The CLI directly

Tasks

  • [ ] An officially supported command
  • [ ] My own modifications

Reproduction

text-embeddings-router start an embedding serving, but always got the error 【 Input validation error: inputs must have less than 512 tokens】,which param should i use to change max tokens of input?512 is too short, i can not find true param in RAEDME or --help

Expected behavior

which param should i use to change max tokens of input?

gctian avatar Jul 26 '24 06:07 gctian

Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error

vrdn-23 avatar Aug 05 '24 21:08 vrdn-23

Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error

To solve the issue of 'Input validation error: inputs must have less than 512 tokens', I added the auto runcate parameter to' true 'when starting the Docker image, but the image failed to start. What is the reason for this? The startup command is as follows: Docker run -- rm -- gpus all - d - p 18082:80-- name multilingual-e5-large-v 'model mount address'' mirror address' -- model id 'model id' -- auto-truncate true

ellahe-git avatar Aug 06 '24 02:08 ellahe-git

Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error

sorry, it's the problem of embedding model, not TEI

gctian avatar Aug 06 '24 10:08 gctian

Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error

sorry, it's the problem of embedding model, not TEI

When starting the tei image, adding the parameter --auto-truncate true, the image cannot be pulled up. What is the reason for this

ellahe-git avatar Aug 07 '24 02:08 ellahe-git

Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error

sorry, it's the problem of embedding model, not TEI

When starting the tei image, adding the parameter --auto-truncate true, the image cannot be pulled up. What is the reason for this

i'm using TEI client not docker image.

gctian avatar Aug 07 '24 12:08 gctian

I'm also facing the same issue, with inputs must have less than 512 tokens

raaj1v avatar Aug 30 '24 11:08 raaj1v

I'm getting the 512 error for BAAI bge-large-en-v1.5. What needs to changed here?

The code is based on automatic_embedding_tei_inference_endpoints#inference-endpoints

    ep = create_inference_endpoint(  #
            ep_name,  #
            repository="BAAI/bge-large-en-v1.5",  #
            framework="pytorch",  #
            accelerator="gpu",  #
            instance_size="x1",  #
            instance_type="nvidia-l4",  #
            region="us-east-1",  #
            vendor="aws",  #
            min_replica=0,  #
            max_replica=1,  #
            task="sentence-embeddings",  #
            type=InferenceEndpointType.PROTECTED,  #
            namespace="newsrx",  #
            custom_image={  #
                "health_route": "/health",  #
                "url": "ghcr.io/huggingface/text-embeddings-inference:1.5.0",  #
                "env": {  #
                    "MAX_BATCH_TOKENS": "16384",  #
                    "MAX_CONCURRENT_REQUESTS": "512",  #
                    "MODEL_ID": "/repository",  #
                    "QUANTIZE": "eetq",  #
                },  #
            })

michael-newsrx avatar Oct 31 '24 13:10 michael-newsrx

If you pass truncate=True in the payload, it will automatically truncate the input and you won't have this issue.

A better chunking strategy might also be useful rather than simple truncation

nbroad1881 avatar Nov 08 '24 02:11 nbroad1881

truncate and --auto-truncate is not feasible for me. I fix this problem by doing this: change tei_fast_rerank.py to this code completely


from __future__ import annotations

from typing import Optional

import requests

from kotaemon.base import Document, Param

from .base import BaseReranking

session = requests.session()


class TeiFastReranking(BaseReranking):


    endpoint_url: str = Param(
        None, help="TEI Reranking service api base URL", required=True
        # 'http://localhost:8080/rerank', help="TEI Reranking service api base URL", required=True

    )
    model_name: Optional[str] = Param(
        None,
        # "BAAI/bge-reranker-large",
        help=(
            "ID of the model to use. You can go to [Supported Models]"
            "(https://github.com/huggingface"
            "/text-embeddings-inference?tab=readme-ov-file"
            "#supported-models) to see the supported models"
        ),
    )
    is_truncated: Optional[bool] = Param(True, help="Whether to truncate the inputs")

    def client(self, query, texts):
        response = session.post(
            url=self.endpoint_url,
            json={
                "query": query,
                "texts": texts,
                # "is_truncated": self.is_truncated,  # default is True
                "is_truncated": True,  # default is True
            },
        ).json()
        return response

    def split_docs(self,docs: list[str], max_length: int = 512) -> list[str]:
        split_list = []
        for doc in docs:
            while len(doc) > max_length:
                split_list.append(doc[:max_length])  
                doc = doc[max_length:]  
            if doc: 
                split_list.append(doc)
        return split_list


    def run(self, documents: list[Document], query: str) -> list[Document]:
        """Use the deployed TEI rerankings service to re-order documents
        with their relevance score"""
        if not self.endpoint_url:
            print("TEI API reranking URL not found. Skipping rerankings.")
            return documents

        compressed_docs: list[Document] = []

        if not documents:  # to avoid empty api call
            return compressed_docs

        if isinstance(documents[0], str):
            documents = self.prepare_input(documents)

        batch_size = 6
        num_batch = max(len(documents) // batch_size, 1)
        for i in range(num_batch):
            if i == num_batch - 1:
                mini_batch = documents[batch_size * i :]
            else:
                mini_batch = documents[batch_size * i : batch_size * (i + 1)]
            
            _docs = [d.content for d in mini_batch]
            _docs = self.split_docs(_docs)

            rerank_resp = self.client(query, _docs)
            print(f"rerank_resp: {rerank_resp}")


            original_indices = []
            for doc in mini_batch:
                
                num_parts = (len(doc.content) + 511) // 512  
                original_indices.extend([doc] * num_parts)

            for r in rerank_resp:
                print("type r", type(r))
                print("r index ", int(r["index"]))
                original_doc = original_indices[int(r["index"])]
                original_doc.metadata["reranking_score"] = r["score"]
                compressed_docs.append(original_doc)

        compressed_docs = sorted(
            compressed_docs, key=lambda x: x.metadata["reranking_score"], reverse=True
        )
        return compressed_docs

dingshengqin avatar Nov 26 '24 01:11 dingshengqin