text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

PaliGemma detection task is failing

Open nph4rd opened this issue 6 months ago • 3 comments

System Info

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0              49W / 400W |  34683MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      7589      G   /usr/lib/xorg/Xorg                           95MiB |
|    0   N/A  N/A      7836      G   /usr/bin/gnome-shell                         12MiB |
|    0   N/A  N/A     43844      C   /opt/conda/bin/python3.10                 34552MiB |
+---------------------------------------------------------------------------------------+

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

I'm running Google's PaliGemma 448-res model with Docker:

model=google/paligemma-3b-pt-448
volume=$PWD/data
docker run --gpus all --shm-size 1g -e HF_TOKEN=$token -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.2.0 --model-id $model

I'm trying to run the detection task using the example code here:

from huggingface_hub import InferenceClient

client = InferenceClient("http://127.0.0.1:8080")
image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
prompt = f"![]({image})detect rabbit\n\n"
for token in client.text_generation(prompt, max_new_tokens=16, stream=True):
    print(token)

# This is a picture of an anthropomorphic rabbit in a space suit.

But I only get <eos> as an output.

I raised a similar issue in vLLM: https://github.com/vllm-project/vllm/issues/7115

I suspected that it had to do with the special <loc> tokens. In the HF implementation the extra loc tokens are added to the Gemma processor:

https://github.com/huggingface/transformers/blob/25245ec26dc29bcf6102e1b4ddd0dfd02e720cf5/src/transformers/models/paligemma/processing_paligemma.py#L38

I'm not sure if it could be that, though.

Expected behavior

I tried to run the same example using this HF space and it ran succesfully:

Screenshot 2024-08-19 at 6 37 14 p m

nph4rd avatar Aug 20 '24 00:08 nph4rd