transformers
transformers copied to clipboard
Cannot reproduce results for Pix2struct on InfographicVQA
I am using the pix2struct-infographics-vqa-base
and pix2struct-infographics-vqa-large
model here and doing inference on InfographicsVQA. However, I get 29.53 ANLS for base and 34.31 ANLS for large, which do not match with the 38.2 and 40.0 results as in the original paper. Could anyone help with this?
Here is my inference code:
import requests
from PIL import Image
import torch
from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor
model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-infographics-vqa-base").to("cuda")
processor = Pix2StructProcessor.from_pretrained("google/pix2struct-infographics-vqa-base")
image_url = "https://blogs.constantcontact.com/wp-content/uploads/2019/03/Social-Media-Infographic.png"
image = Image.open(requests.get(image_url, stream=True).raw)
question = "Which social platform has heavy female audience?"
inputs = processor(images=image, text=question, return_tensors="pt").to("cuda")
predictions = model.generate(**inputs)
pred = processor.decode(predictions[0], skip_special_tokens=True)
gt = 'pinterest'
print(pred)
cc @younesbelkada
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
gentle ping @younesbelkada
Hi everyone,
Sadly I won't have the bandwidth to properly dig into this right now, @Lizw14 do you still face the same issue when using the main branch of transformers
?
pip install git+https://github.com/huggingface/transformers.git
@Lizw14 quickly going back to the issue, can you double check you used the same hyper parameters than the ones presented on the paper? for example what is the sequence length you are using? in what precision do you load the model (fp32, fp16, bf16, int8)? Ideally can you share the full script you use to reproduce the results of the paper Thanks!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.