Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

MobileViT

Open ludmila3 opened this issue 2 years ago • 6 comments

I am trying to run classification using MobileViT model and the run_image_classification.py. Unfortunately the script has multiple issues with this model. For example the line: normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std) breaks because the mobilevit model does not have image_mean and std defined in its image_processor. Another example is in the line: _train_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"] example_batch does not have a key called "image" Can you provide an updated version of run_image_classification.py ?

ludmila3 avatar Jan 20 '23 17:01 ludmila3

Hi,

Note that the example scripts aren't meant to work out-of-the-box with all models, as that would be pretty hard. The scripts are intended to serve as illustration and can be tweaked easily.

In case of MobileViT, you don't need the normalization operation (rescaling, i.e. dividing by 255 to be in the [-1, 1] range is enough). It's advised to use data augmentation operations during training to increase robustness, using a library like torchvision (as in the example), Albumentations, Kornia, etc.

I made a Colab notebook to illustrate running the script for MobileViT. Importantly, I did 2 things:

  1. remove normalization
  2. pass the ignore_mismatched_sizes argument to the script in order to fine-tune the already fine-tuned apple/mobilevit-small model. Note that there's no need for this argument in case you'd like to train a model from scratch.

Note that the arguments still need some tuning, e.g. the learning rate and number of steps can probably be improved, etc.

NielsRogge avatar Jan 20 '23 21:01 NielsRogge

Thank you very much for your fast response! However my problems arise when trying to evaluate the model on imagenet-1k (beans is not my target dataset) and just removing the normalization from the script does not help. It seems that the image processor chokes on some of the images> The line that breaks it is: example_batch["pixel_values"] = [_val_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]] The error is: KeyError: 'image' For some reason example_batch does not contain the images at all

ludmila3 avatar Jan 21 '23 06:01 ludmila3

I attempted to write my own script, which runs successfully on some number of images and then breaks with:

File "/home/ludmila/transformers/src/transformers/image_utils.py", line 119, in infer_channel_dimension_format
    raise ValueError(f"Unsupported number of image dimensions: {image.ndim}")
ValueError: Unsupported number of image dimensions: 2

Here is my script:

from datasets import load_dataset
from transformers import MobileViTFeatureExtractor, MobileViTForImageClassification
import torch
import numpy as np
from datasets import load_metric
from transformers import TrainingArguments
from transformers import Trainer
import os

#ds = load_dataset('/data/imagenet-1k/', ignore_verifications=True)
data_files = {}
data_files["train"] = os.path.join("/data/imagenet/train", "**")
data_files["validation"] = os.path.join("/data/imagenet/val", "**")
ds = load_dataset(
    "imagefolder",
    data_files=data_files,
    #cache_dir="/data/",
    task="image-classification",
)
lab = "labels"

model_name_or_path = 'mobilevit-x-small'
feature_extractor = MobileViTFeatureExtractor.from_pretrained(model_name_or_path)

def collate_fn(batch):
    return {
        'pixel_values': torch.stack([x['pixel_values'] for x in batch]),
        'labels': torch.tensor([x[lab] for x in batch])

    }

def transform(example_batch):
    # Take a list of PIL images and turn them to pixel values
    inputs = feature_extractor([x for x in example_batch['image']], return_tensors='pt')

    # Don't forget to include the labels!
    inputs[lab] = example_batch[lab]
    return inputs

def compute_metrics(p):
    return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)

metric = load_metric("accuracy")
prepared_ds = ds.with_transform(transform)
labels = prepared_ds['train'].features[lab].names

model = MobileViTForImageClassification.from_pretrained(
    model_name_or_path,
    ignore_mismatched_sizes=True,
    num_labels=len(labels),
    id2label={str(i): c for i, c in enumerate(labels)},
    label2id={c: str(i) for i, c in enumerate(labels)}
)

training_args = TrainingArguments(
  output_dir="./output",
  per_device_train_batch_size=16,
  evaluation_strategy="steps",
  num_train_epochs=0,
  fp16=False,
  save_steps=100,
  eval_steps=100,
  logging_steps=10,
  learning_rate=2e-4,
  save_total_limit=2,
  remove_unused_columns=False,
  push_to_hub=False,
  report_to='tensorboard',
  load_best_model_at_end=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    train_dataset=prepared_ds["train"],
    eval_dataset=prepared_ds["validation"],
    tokenizer=feature_extractor,
)

metrics = trainer.evaluate(prepared_ds['validation'])
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)

ludmila3 avatar Jan 21 '23 06:01 ludmila3

It looks like you need to convert your images to RGB before processing them (image = Image.open(...).convert("RGB"))

NielsRogge avatar Apr 03 '23 07:04 NielsRogge

Any solution to example_batch["pixel_values"] = [_val_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]] The error is: KeyError: 'image' example_batch has no "image" key

umarkhalidAI avatar Jun 26 '23 20:06 umarkhalidAI

check if your dataset contains png files and delete them. it helped in my case.

mjamroz avatar Jun 30 '23 19:06 mjamroz