transformers
transformers copied to clipboard
ValueError: Unsupported number of image dimensions: 2 - An error during embedding Image data
System Info
I am facing an issue during encoding image dataset using facebook/dino-vits16, I faced this issue with grayscale images before too but it worked well with Bingsu/Human_Action_Recognition dataset. Versions
transformers==4.32.0
torch==2.0.1+cu118
datasets==2.14.4
The error:
Some weights of ViTModel were not initialized from the model checkpoint at facebook/dino-vits16 and are newly initialized: ['pooler.dense.weight', 'pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 0%
2/10000 [00:00<40:18, 4.13 examples/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-30-0547920c10ef>](https://localhost:8080/#) in <cell line: 22>()
20 return batch
21
---> 22 dataset_train = dataset_train.map(get_embeddings)
8 frames
[/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
590 self: "Dataset" = kwargs.pop("self")
591 # apply actual function
--> 592 out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
593 datasets: List["Dataset"] = list(out.values()) if isinstance(out, dict) else [out]
594 for dataset in datasets:
[/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
555 }
556 # apply actual function
--> 557 out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
558 datasets: List["Dataset"] = list(out.values()) if isinstance(out, dict) else [out]
559 # re-apply format to the output
[/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py](https://localhost:8080/#) in map(self, function, with_indices, with_rank, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, load_from_cache_file, cache_file_name, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, suffix_template, new_fingerprint, desc)
3095 desc=desc or "Map",
3096 ) as pbar:
-> 3097 for rank, done, content in Dataset._map_single(**dataset_kwargs):
3098 if done:
3099 shards_done += 1
[/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py](https://localhost:8080/#) in _map_single(shard, function, with_indices, with_rank, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, cache_file_name, writer_batch_size, features, disable_nullable, fn_kwargs, new_fingerprint, rank, offset)
3448 _time = time.time()
3449 for i, example in shard_iterable:
-> 3450 example = apply_function_on_filtered_inputs(example, i, offset=offset)
3451 if update_data:
3452 if i == 0:
[/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py](https://localhost:8080/#) in apply_function_on_filtered_inputs(pa_inputs, indices, check_same_num_examples, offset)
3351 if with_rank:
3352 additional_args += (rank,)
-> 3353 processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
3354 if isinstance(processed_inputs, LazyDict):
3355 processed_inputs = {
[<ipython-input-30-0547920c10ef>](https://localhost:8080/#) in get_embeddings(batch)
14
15 def get_embeddings(batch):
---> 16 inputs = processor(images=batch['image'], return_tensors="pt").to(device)
17 with torch.no_grad():
18 outputs = model(**inputs).last_hidden_state.mean(dim=1).cpu().numpy()
[/usr/local/lib/python3.10/dist-packages/transformers/image_processing_utils.py](https://localhost:8080/#) in __call__(self, images, **kwargs)
544 def __call__(self, images, **kwargs) -> BatchFeature:
545 """Preprocess an image or a batch of images."""
--> 546 return self.preprocess(images, **kwargs)
547
548 def preprocess(self, images, **kwargs) -> BatchFeature:
[/usr/local/lib/python3.10/dist-packages/transformers/models/vit/image_processing_vit.py](https://localhost:8080/#) in preprocess(self, images, do_resize, size, resample, do_rescale, rescale_factor, do_normalize, image_mean, image_std, return_tensors, data_format, input_data_format, **kwargs)
232 if input_data_format is None:
233 # We assume that all images have the same channel dimension format.
--> 234 input_data_format = infer_channel_dimension_format(images[0])
235
236 if do_resize:
[/usr/local/lib/python3.10/dist-packages/transformers/image_utils.py](https://localhost:8080/#) in infer_channel_dimension_format(image, num_channels)
168 first_dim, last_dim = 1, 3
169 else:
--> 170 raise ValueError(f"Unsupported number of image dimensions: {image.ndim}")
171
172 if image.shape[first_dim] in num_channels:
ValueError: Unsupported number of image dimensions: 2
Who can help?
@amyeroberts
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
from transformers import ViTImageProcessor, ViTModel
from datasets import load_dataset, Dataset
import torch
dataset_train = load_dataset(
'ashraq/fashion-product-images-small', split='train[:10000]'
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
processor = ViTImageProcessor.from_pretrained('facebook/dino-vits16')
model = ViTModel.from_pretrained('facebook/dino-vits16')
def get_embeddings(batch):
inputs = processor(images=batch['image'], return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs).last_hidden_state.mean(dim=1).cpu().numpy()
batch['embeddings'] = outputs
return batch
dataset_train = dataset_train.map(get_embeddings)
Expected behavior
Expected behavior was to obtaining embeddings.