mm-cot Question :The code to generate Vision Features

I would like to study more about the Vision Features, is it convenient to share the coding part to generate the npy file? Much appreciate the hard work here.

Mar 02 '23 15:03 roapple10

Bump

Mar 05 '23 18:03 gianfrancodemarco

Hi, I made this code snippet for visual feature extraction. Unfortunately, the results obtained on the ScienceQA dataset differ (slightly) from those present in this repository. Despite this, the results obtained are consistent in size and allow the execution of both classification and rationale generation. Hope it can be useful.

from transformers import AutoImageProcessor, DetrForObjectDetection
from PIL import Image
import torch

pretrained_model = "facebook/detr-resnet-101-dc5"
image_processor = AutoImageProcessor.from_pretrained(pretrained_model)
model = DetrForObjectDetection.from_pretrained(pretrained_model)

image_path = "img.jpg"
image = Image.open(image_path)
inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs) 

# the last hidden states are the final query embeddings of the Transformer decoder
vision_features = outputs.last_hidden_state.numpy()

Mar 12 '23 18:03 Francesco-Ranieri

Thanks the author for this awesome work!

Some questions in the dataset contain both image of question and images of the choices. I was wondering how the author get the visual features in this case. Are there some pooling funtion applied?

How do you deal with this case, Francesco-Ranieri?

Mar 14 '23 22:03 aiPenguin

As long as i understood by their implementation, always one image features vector is used for each question. Being the code of the vision features generation not available, we need an answer from the authors to know if any pooling function was applied. However, i honestly think that only one image was taken into consideration.

Mar 16 '23 21:03 Francesco-Ranieri

Same opinion as yours. But I found that there are more features in .npy than questions which have image contexts. So I open another issue with respect to it. #46

Mar 23 '23 16:03 aiPenguin

mm-cot mm-cot copied to clipboard

Question :The code to generate Vision Features

mm-cot
mm-cot copied to clipboard