infinity icon indicating copy to clipboard operation
infinity copied to clipboard

1x1 input image can crash /v1/embeddings endpoint until pod restart while health check endpoint continues to return 200

Open scottt732 opened this issue 6 months ago • 6 comments

System Info

We have an image processing pipeline where we're using Infinity behind KubeAI to compute CLIP embeddings for images.

curl -vv http://kubeai/openai/v1/embeddings \
  --http1.1 -H "Authorization: Bearer blah" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "https://some.image.url/image.png",
    "model": "infinity-hosted-clip-embedding-model",
    "modality": "image"
  }'

Every now and then we get into a situation where a single bad request comes along (working on trying to isolate it). From that point on, our steady stream of embeddings, health, and metrics calls ... becomes a steady stream of just health check and metric calls (both of which continue to return 200s). The model stays running but stops receiving any embedding requests from kubeai and we essentially lose 1/n of our throughput.

INFO:     10.15.5.245:57645 - "POST /v1/embeddings HTTP/1.1" 200 OK
ERROR    2025-06-26 13:23:51,860 infinity_emb ERROR: broken batch_handler.py:557
         data stream when reading image file
         Traceback (most recent call last):
           File
         "/app/infinity_emb/inference/batch_handler.py",
         line 541, in _preprocess_batch
             feat = self._model.encode_pre(items_for_pre)
           File
         "/app/infinity_emb/transformer/vision/torch_vision
         .py", line 159, in encode_pre
             preprocessed = self.processor(
           File
         "/app/.venv/lib/python3.10/site-packages/transform
         ers/models/clip/processing_clip.py", line 109, in
         __call__
             image_features = self.image_processor(images,
         return_tensors=return_tensors,
         **image_processor_kwargs)
           File
         "/app/.venv/lib/python3.10/site-packages/transform
         ers/image_processing_utils.py", line 41, in
         __call__
             return self.preprocess(images, **kwargs)
           File
         "/app/.venv/lib/python3.10/site-packages/transform
         ers/models/clip/image_processing_clip.py", line
         307, in preprocess
             images = [convert_to_rgb(image) for image in
         images]
           File
         "/app/.venv/lib/python3.10/site-packages/transform
         ers/models/clip/image_processing_clip.py", line
         307, in <listcomp>
             images = [convert_to_rgb(image) for image in
         images]
           File
         "/app/.venv/lib/python3.10/site-packages/transform
         ers/image_transforms.py", line 776, in
         convert_to_rgb
             image = image.convert("RGB")
           File
         "/app/.venv/lib/python3.10/site-packages/PIL/Image
         .py", line 995, in convert
             self.load()
           File
         "/app/.venv/lib/python3.10/site-packages/PIL/Image
         File.py", line 312, in load
             raise _get_oserror(err_code, encoder=False)
         OSError: broken data stream when reading image
         file

Any ideas?

Information

  • [x] Docker + cli
  • [ ] pip + cli
  • [ ] pip + usage of Python interface

Tasks

  • [x] An officially supported CLI command
  • [ ] My own modifications

Reproduction

I'm trying to find the image URL that reproduces this. Will update back if I find one.

scottt732 avatar Jun 26 '25 22:06 scottt732

I was able to find a repro that works consistently directly against infinity (updated issue to remove KubeAI).

curl -vv http://localhost:50574/v1/embeddings \
  --http1.1 -H "Authorization: Bearer doesnt-matter" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "",
    "model": "repro", 
    "modality": "image"
  }'

The issue has to do with data: encoded images that are 1x1 with 1 or 3 channels (RGB) where the normalization that happens in huggingface/transformers can't determine the channel order in [1,1,3] (channels first or channels last) b/c it's ambiguous.

The bigger problem here is that infinity doesn't handle this error gracefully.

* Host localhost:50574 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:50574...
* Connected to localhost (::1) port 50574
> POST /v1/embeddings HTTP/1.1
> Host: localhost:50574
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer doesnt-matter
> Content-Type: application/json
> Content-Length: 204
>
* upload completely sent off: 204 bytes

We never get a reply... It just eventually times out. All future requests to the /v1/embeddings endpoint also fail from that point on until we kill & replace the pod. The /health and /metrics endpoints keep returning 200s

KubeAI assumed the pod was healthy b/c of the 200's and spread traffic around to all infinity instances. And this becomes like a slow motion DoS (once each pod has encountered the bad input data once, no more CLIP embeddings--of text or images).

❯ curl -vv http://localhost:50574/v1/embeddings \
  --http1.1 -H "Authorization: Bearer doesnt-matter" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "hi",
    "model": "repro", 
    "modality": "text"
  }'
* Host localhost:50574 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:50574...
* Connected to localhost (::1) port 50574
> POST /v1/embeddings HTTP/1.1
> Host: localhost:50574
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer doesnt-matter
> Content-Type: application/json
> Content-Length: 90
>
* upload completely sent off: 90 bytes

scottt732 avatar Jun 30 '25 17:06 scottt732

Which version of infinity are you using? Never releases have validation to not allow 1x1 images.

michaelfeil avatar Jun 30 '25 23:06 michaelfeil

Just checked, the current check is only applied to images provided via URL. The linked PR applies the check to all methods. With the check, the server returns a "Bad Request" for images smaller than 3x3 which also applies to the example provided. This will circumvent the problem described.

wirthual avatar Jul 01 '25 06:07 wirthual

Thanks!. Just want to make sure this didn't get lost above: This unhandled exception broke the /v1/embeddings call that triggered it (never got a response) and prevented all future executions of that endpoint from working at all. Maybe missing a with block or try/finally on a queue slot or something?

Image is: michaelf34/infinity:0.0.76

scottt732 avatar Jul 01 '25 14:07 scottt732

I added additional changes in this branch: #614

@scottt732 Could you give this branch a spin and report back if it solves your issue?

wirthual avatar Jul 17 '25 18:07 wirthual

I think @scottt732 was not on the most recent branch. This issue should be fixed long time ago.

michaelfeil avatar Aug 22 '25 23:08 michaelfeil