transformers PixtralProcessor always returns outputs of length 1

PixtralProcessor always returns outputs of length 1

Open Infernaught opened this issue 1 year ago • 0 comments

System Info

transformers version: 4.45.2
Platform: Linux-5.4.0-1113-oracle-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.23.2
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.2.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: False
Using GPU in script?: True
GPU type: NVIDIA RTX A5000

Who can help?

@ArthurZucker @amyeroberts Hello! I'm working on finetuning a VLM, and during my dataset preprocessing, I'm noticing that the PixtralProcessor always returns the input ids for only the first example in a batch. I think this is being caused by the lines here, which are aggregating all of the images into a length 1 list of lists, which is messing with the iteration over the zip here. Is this unintended behavior or am I doing something wrong?

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

processor = AutoProcessor.from_pretrained("mistral-community/pixtral-12b")

# Define prompts as a list of 50 text prompts and images as a list of 50 decoded images
batch = processor(prompts, images, padding=True, return_tensors="pt") # Returns a batch containing only the first of the 50 elements

Expected behavior

I would expect it to return the outputs corresponding to all 50 prompts and images.

Oct 16 '24 22:10 Infernaught

transformers transformers copied to clipboard

PixtralProcessor always returns outputs of length 1

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers
transformers copied to clipboard