transformers icon indicating copy to clipboard operation
transformers copied to clipboard

PixtralProcessor always returns outputs of length 1

Open Infernaught opened this issue 1 year ago • 0 comments

System Info

  • transformers version: 4.45.2
  • Platform: Linux-5.4.0-1113-oracle-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.2
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: False
  • Using GPU in script?: True
  • GPU type: NVIDIA RTX A5000

Who can help?

@ArthurZucker @amyeroberts Hello! I'm working on finetuning a VLM, and during my dataset preprocessing, I'm noticing that the PixtralProcessor always returns the input ids for only the first example in a batch. I think this is being caused by the lines here, which are aggregating all of the images into a length 1 list of lists, which is messing with the iteration over the zip here. Is this unintended behavior or am I doing something wrong?

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

processor = AutoProcessor.from_pretrained("mistral-community/pixtral-12b")

# Define prompts as a list of 50 text prompts and images as a list of 50 decoded images
batch = processor(prompts, images, padding=True, return_tensors="pt") # Returns a batch containing only the first of the 50 elements

Expected behavior

I would expect it to return the outputs corresponding to all 50 prompts and images.

Infernaught avatar Oct 16 '24 22:10 Infernaught