transformers Resize(feature_extractor.size) is a dictionary, not an int or sequence

System Info

Hello, this bug was already once addressed (https://discuss.huggingface.co/t/image-classification-tutorial-bug/37267) but on google it arose again today when running this script.

from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
train_transforms = Compose(
        [
            RandomResizedCrop(feature_extractor.size),
            RandomHorizontalFlip(),
            ToTensor(),
            normalize,
        ]
    )

val_transforms = Compose(
        [
            Resize(feature_extractor.size), ## this is the error line
            CenterCrop(feature_extractor.size),
            ToTensor(),
            normalize,
        ]
    )

def preprocess_train(example_batch):
    """Apply train_transforms across a batch."""
    example_batch["pixel_values"] = [
        train_transforms(image.convert("RGB")) for image in example_batch["image"]
    ]
    return example_batch

def preprocess_val(example_batch):
    """Apply val_transforms across a batch."""
    example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

I tried installing the exact version of transformers but did not help: https://stackoverflow.com/questions/76142308/fixerror-typeerror-size-should-be-int-or-sequence-got-class-dict

Here is my transformers data

- `transformers` version: 4.23.1
- Platform: Linux-6.1.58+-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.22.2
- PyTorch version (GPU?): 2.2.1+cu121 (False)
- Tensorflow version (GPU?): 2.15.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.8.2 (cpu)
- Jax version: 0.4.26
- JaxLib version: 0.4.26
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

Who can help?

@amyeroberts

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Get my dataset from https://huggingface.co/datasets/samokosik/clothes
Try preprocess it via the script I submitted.
You will get this error,

Expected behavior

Not assigning a dictionary to size.

Apr 23 '24 15:04 dehlong

Hi @dehlong, thanks for opening an issue!

For future issues, please make sure to share:

The running environment: run transformers-cli env in the terminal and copy-paste the output
Information about the error encountered, including full traceback

The feature extractors for vision models have been deprecated for a while now (over a year), with image processors taking their place. The image processors have size stored as a dictionary. This is to disambiguate resizing behaviour, as previously size could be used to define the shortest edge, or to define the height and width.

You can see up-to-date examples of how to use them in our examples scripts e.g. here for image classification.

The updated script would look like this:

from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

size = image_processor.size
if "height" in size:
    crop_size = (size["height"], size["width"])
    resize_size = (size["height"], size["width"])
elif "shortest_edge" in size:
    crop_size = resize_size = size["shortest_edge"]

normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
train_transforms = Compose(
        [
            RandomResizedCrop(crop_size),
            RandomHorizontalFlip(),
            ToTensor(),
            normalize,
        ]
    )

val_transforms = Compose(
        [
            Resize(resize_size),
            CenterCrop(crop_size),
            ToTensor(),
            normalize,
        ]
    )

def preprocess_train(example_batch):
    """Apply train_transforms across a batch."""
    example_batch["pixel_values"] = [
        train_transforms(image.convert("RGB")) for image in example_batch["image"]
    ]
    return example_batch

def preprocess_val(example_batch):
    """Apply val_transforms across a batch."""
    example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

Out of interest - where did you get this example from? I would be great to know in case there's places in our resource or documentation we need to make sure are updated.

Apr 23 '24 17:04 amyeroberts

Hello, thank you the reply. However, may I ask whether there are any caveats with the image processor? size = image_processor.size ^^ This line gives out error that image_processor is not defined. I tried importing it directly from transformers, however with no success. Or is there something I am unaware of and I am supposed to build my own image processor like displayed in the code you sent?

also, regarding the example: I got it from Rajistics (https://www.youtube.com/watch?v=ahgB8c_TgA8)

Apr 23 '24 19:04 dehlong

Yes, you need to define the image processor in the same way you defined the feature extractor

from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained(checkpoint)

Apr 24 '24 09:04 amyeroberts

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 24 '24 08:05 github-actions[bot]