text-generation-inference Image eats up way too many tokens

Image eats up way too many tokens

Open aymeric-roucher opened this issue 9 months ago • 2 comments

System Info

Using Inference Endpoint here: https://endpoints.huggingface.co/m-ric/endpoints/qwen2-72b-instruct-psj ghcr.io/huggingface/text-generation-inference:3.0.1

Information

[ ] Docker
[x] The CLI directly

Tasks

[x] An officially supported command
[ ] My own modifications

Reproduction

Here's what I'm trying to run:

import base64
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()

client = OpenAI(
    base_url="https://lmqbs8965pj40e01.us-east-1.aws.endpoints.huggingface.cloud/v1",
    api_key=os.getenv("HF_TOKEN")
)

with open('./screenshot.png', 'rb') as img_file:
    base64_image = base64.b64encode(img_file.read()).decode('utf-8')


client.chat.completions.create(
    model="a",
    messages=[
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's on this screenshot?"},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_image}"}
            }
        ]
    }
])

The image is not big, here it is:

I get this errror:

huggingface_hub.errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://lmqbs8965pj40e01.us-east-1.aws.endpoints.huggingface.cloud/v1/chat/completions (Request ID: 9kQ8on)

Input validation error: `inputs` tokens + `max_new_tokens` must be <= 32768. Given: 96721 `inputs` tokens and 0 `max_new_tokens`

It seems like my image was converted to a very large image, when original it is roughly only 1000*1000 pixels.

Expected behavior

I'd expect the uploaded image to be <1k tokens instead of ~100k tokens.

Other APIs (OpenAI, Anthropic) handle the same image fine, so I'm wondering: do they do some image size reduction pre-processing? Or is this a bug on TGI side?

Jan 17 '25 16:01 aymeric-roucher

text-generation-inference text-generation-inference copied to clipboard

Image eats up way too many tokens

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard