Input validation error: `inputs` must have less than 512 tokens. Given: 534
System Info
-
text-embeddings-router 1.1.0
-
python3.10
-
centos
-
A800
Information
- [ ] Docker
- [X] The CLI directly
Tasks
- [ ] An officially supported command
- [ ] My own modifications
Reproduction
text-embeddings-router start an embedding serving, but always got the error 【 Input validation error: inputs must have less than 512 tokens】,which param should i use to change max tokens of input?512 is too short, i can not find true param in RAEDME or --help
Expected behavior
which param should i use to change max tokens of input?
Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error
Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error
To solve the issue of 'Input validation error: inputs must have less than 512 tokens', I added the auto runcate parameter to' true 'when starting the Docker image, but the image failed to start. What is the reason for this? The startup command is as follows: Docker run -- rm -- gpus all - d - p 18082:80-- name multilingual-e5-large-v 'model mount address'' mirror address' -- model id 'model id' -- auto-truncate true
Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error
sorry, it's the problem of embedding model, not TEI
Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error
sorry, it's the problem of embedding model, not TEI
When starting the tei image, adding the parameter --auto-truncate true, the image cannot be pulled up. What is the reason for this
Which model are you using? The max length of inputs is usually determined by the model max length and TEI usually provides a truncate parameter to decide if you want to shorten the text or return an error
sorry, it's the problem of embedding model, not TEI
When starting the tei image, adding the parameter --auto-truncate true, the image cannot be pulled up. What is the reason for this
i'm using TEI client not docker image.
I'm also facing the same issue, with inputs must have less than 512 tokens
I'm getting the 512 error for BAAI bge-large-en-v1.5. What needs to changed here?
The code is based on automatic_embedding_tei_inference_endpoints#inference-endpoints
ep = create_inference_endpoint( #
ep_name, #
repository="BAAI/bge-large-en-v1.5", #
framework="pytorch", #
accelerator="gpu", #
instance_size="x1", #
instance_type="nvidia-l4", #
region="us-east-1", #
vendor="aws", #
min_replica=0, #
max_replica=1, #
task="sentence-embeddings", #
type=InferenceEndpointType.PROTECTED, #
namespace="newsrx", #
custom_image={ #
"health_route": "/health", #
"url": "ghcr.io/huggingface/text-embeddings-inference:1.5.0", #
"env": { #
"MAX_BATCH_TOKENS": "16384", #
"MAX_CONCURRENT_REQUESTS": "512", #
"MODEL_ID": "/repository", #
"QUANTIZE": "eetq", #
}, #
})
If you pass truncate=True in the payload, it will automatically truncate the input and you won't have this issue.
A better chunking strategy might also be useful rather than simple truncation
truncate and --auto-truncate is not feasible for me. I fix this problem by doing this: change tei_fast_rerank.py to this code completely
from __future__ import annotations
from typing import Optional
import requests
from kotaemon.base import Document, Param
from .base import BaseReranking
session = requests.session()
class TeiFastReranking(BaseReranking):
endpoint_url: str = Param(
None, help="TEI Reranking service api base URL", required=True
# 'http://localhost:8080/rerank', help="TEI Reranking service api base URL", required=True
)
model_name: Optional[str] = Param(
None,
# "BAAI/bge-reranker-large",
help=(
"ID of the model to use. You can go to [Supported Models]"
"(https://github.com/huggingface"
"/text-embeddings-inference?tab=readme-ov-file"
"#supported-models) to see the supported models"
),
)
is_truncated: Optional[bool] = Param(True, help="Whether to truncate the inputs")
def client(self, query, texts):
response = session.post(
url=self.endpoint_url,
json={
"query": query,
"texts": texts,
# "is_truncated": self.is_truncated, # default is True
"is_truncated": True, # default is True
},
).json()
return response
def split_docs(self,docs: list[str], max_length: int = 512) -> list[str]:
split_list = []
for doc in docs:
while len(doc) > max_length:
split_list.append(doc[:max_length])
doc = doc[max_length:]
if doc:
split_list.append(doc)
return split_list
def run(self, documents: list[Document], query: str) -> list[Document]:
"""Use the deployed TEI rerankings service to re-order documents
with their relevance score"""
if not self.endpoint_url:
print("TEI API reranking URL not found. Skipping rerankings.")
return documents
compressed_docs: list[Document] = []
if not documents: # to avoid empty api call
return compressed_docs
if isinstance(documents[0], str):
documents = self.prepare_input(documents)
batch_size = 6
num_batch = max(len(documents) // batch_size, 1)
for i in range(num_batch):
if i == num_batch - 1:
mini_batch = documents[batch_size * i :]
else:
mini_batch = documents[batch_size * i : batch_size * (i + 1)]
_docs = [d.content for d in mini_batch]
_docs = self.split_docs(_docs)
rerank_resp = self.client(query, _docs)
print(f"rerank_resp: {rerank_resp}")
original_indices = []
for doc in mini_batch:
num_parts = (len(doc.content) + 511) // 512
original_indices.extend([doc] * num_parts)
for r in rerank_resp:
print("type r", type(r))
print("r index ", int(r["index"]))
original_doc = original_indices[int(r["index"])]
original_doc.metadata["reranking_score"] = r["score"]
compressed_docs.append(original_doc)
compressed_docs = sorted(
compressed_docs, key=lambda x: x.metadata["reranking_score"], reverse=True
)
return compressed_docs