transformers
                                
                                 transformers copied to clipboard
                                
                                    transformers copied to clipboard
                            
                            
                            
                        PixtralProcessor always returns outputs of length 1
System Info
- transformersversion: 4.45.2
- Platform: Linux-5.4.0-1113-oracle-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.2
- Safetensors version: 0.4.5
- Accelerate version: 0.34.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: False
- Using GPU in script?: True
- GPU type: NVIDIA RTX A5000
Who can help?
@ArthurZucker @amyeroberts Hello! I'm working on finetuning a VLM, and during my dataset preprocessing, I'm noticing that the PixtralProcessor always returns the input ids for only the first example in a batch. I think this is being caused by the lines here, which are aggregating all of the images into a length 1 list of lists, which is messing with the iteration over the zip here. Is this unintended behavior or am I doing something wrong?
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the examplesfolder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)
Reproduction
processor = AutoProcessor.from_pretrained("mistral-community/pixtral-12b")
# Define prompts as a list of 50 text prompts and images as a list of 50 decoded images
batch = processor(prompts, images, padding=True, return_tensors="pt") # Returns a batch containing only the first of the 50 elements
Expected behavior
I would expect it to return the outputs corresponding to all 50 prompts and images.