text-generation-inference
text-generation-inference copied to clipboard
Image eats up way too many tokens
System Info
Using Inference Endpoint here: https://endpoints.huggingface.co/m-ric/endpoints/qwen2-72b-instruct-psj ghcr.io/huggingface/text-generation-inference:3.0.1
Information
- [ ] Docker
- [x] The CLI directly
Tasks
- [x] An officially supported command
- [ ] My own modifications
Reproduction
Here's what I'm trying to run:
import base64
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(
base_url="https://lmqbs8965pj40e01.us-east-1.aws.endpoints.huggingface.cloud/v1",
api_key=os.getenv("HF_TOKEN")
)
with open('./screenshot.png', 'rb') as img_file:
base64_image = base64.b64encode(img_file.read()).decode('utf-8')
client.chat.completions.create(
model="a",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's on this screenshot?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"}
}
]
}
])
The image is not big, here it is:
I get this errror:
huggingface_hub.errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://lmqbs8965pj40e01.us-east-1.aws.endpoints.huggingface.cloud/v1/chat/completions (Request ID: 9kQ8on)
Input validation error: `inputs` tokens + `max_new_tokens` must be <= 32768. Given: 96721 `inputs` tokens and 0 `max_new_tokens`
It seems like my image was converted to a very large image, when original it is roughly only 1000*1000 pixels.
Expected behavior
I'd expect the uploaded image to be <1k tokens instead of ~100k tokens.
Other APIs (OpenAI, Anthropic) handle the same image fine, so I'm wondering: do they do some image size reduction pre-processing? Or is this a bug on TGI side?