Transformers-Tutorials
Transformers-Tutorials copied to clipboard
MobileViT
I am trying to run classification using MobileViT model and the run_image_classification.py.
Unfortunately the script has multiple issues with this model.
For example the line:
normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
breaks because the mobilevit model does not have image_mean and std defined in its image_processor.
Another example is in the line:
_train_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]
example_batch does not have a key called "image"
Can you provide an updated version of run_image_classification.py ?
Hi,
Note that the example scripts aren't meant to work out-of-the-box with all models, as that would be pretty hard. The scripts are intended to serve as illustration and can be tweaked easily.
In case of MobileViT, you don't need the normalization operation (rescaling, i.e. dividing by 255 to be in the [-1, 1] range is enough). It's advised to use data augmentation operations during training to increase robustness, using a library like torchvision (as in the example), Albumentations, Kornia, etc.
I made a Colab notebook to illustrate running the script for MobileViT. Importantly, I did 2 things:
- remove normalization
- pass the
ignore_mismatched_sizesargument to the script in order to fine-tune the already fine-tuned apple/mobilevit-small model. Note that there's no need for this argument in case you'd like to train a model from scratch.
Note that the arguments still need some tuning, e.g. the learning rate and number of steps can probably be improved, etc.
Thank you very much for your fast response! However my problems arise when trying to evaluate the model on imagenet-1k (beans is not my target dataset) and just removing the normalization from the script does not help. It seems that the image processor chokes on some of the images> The line that breaks it is:
example_batch["pixel_values"] = [_val_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]]
The error is:
KeyError: 'image'
For some reason example_batch does not contain the images at all
I attempted to write my own script, which runs successfully on some number of images and then breaks with:
File "/home/ludmila/transformers/src/transformers/image_utils.py", line 119, in infer_channel_dimension_format
raise ValueError(f"Unsupported number of image dimensions: {image.ndim}")
ValueError: Unsupported number of image dimensions: 2
Here is my script:
from datasets import load_dataset
from transformers import MobileViTFeatureExtractor, MobileViTForImageClassification
import torch
import numpy as np
from datasets import load_metric
from transformers import TrainingArguments
from transformers import Trainer
import os
#ds = load_dataset('/data/imagenet-1k/', ignore_verifications=True)
data_files = {}
data_files["train"] = os.path.join("/data/imagenet/train", "**")
data_files["validation"] = os.path.join("/data/imagenet/val", "**")
ds = load_dataset(
"imagefolder",
data_files=data_files,
#cache_dir="/data/",
task="image-classification",
)
lab = "labels"
model_name_or_path = 'mobilevit-x-small'
feature_extractor = MobileViTFeatureExtractor.from_pretrained(model_name_or_path)
def collate_fn(batch):
return {
'pixel_values': torch.stack([x['pixel_values'] for x in batch]),
'labels': torch.tensor([x[lab] for x in batch])
}
def transform(example_batch):
# Take a list of PIL images and turn them to pixel values
inputs = feature_extractor([x for x in example_batch['image']], return_tensors='pt')
# Don't forget to include the labels!
inputs[lab] = example_batch[lab]
return inputs
def compute_metrics(p):
return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)
metric = load_metric("accuracy")
prepared_ds = ds.with_transform(transform)
labels = prepared_ds['train'].features[lab].names
model = MobileViTForImageClassification.from_pretrained(
model_name_or_path,
ignore_mismatched_sizes=True,
num_labels=len(labels),
id2label={str(i): c for i, c in enumerate(labels)},
label2id={c: str(i) for i, c in enumerate(labels)}
)
training_args = TrainingArguments(
output_dir="./output",
per_device_train_batch_size=16,
evaluation_strategy="steps",
num_train_epochs=0,
fp16=False,
save_steps=100,
eval_steps=100,
logging_steps=10,
learning_rate=2e-4,
save_total_limit=2,
remove_unused_columns=False,
push_to_hub=False,
report_to='tensorboard',
load_best_model_at_end=False,
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=collate_fn,
compute_metrics=compute_metrics,
train_dataset=prepared_ds["train"],
eval_dataset=prepared_ds["validation"],
tokenizer=feature_extractor,
)
metrics = trainer.evaluate(prepared_ds['validation'])
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)
It looks like you need to convert your images to RGB before processing them (image = Image.open(...).convert("RGB"))
Any solution to example_batch["pixel_values"] = [_val_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]] The error is: KeyError: 'image' example_batch has no "image" key
check if your dataset contains png files and delete them. it helped in my case.