Otter Release the evaluation scripts

Hello, I'm trying to test Otter on MSCOCO dataset, would you please release the evaluation codes? Thanks for your help!

Jul 01 '23 14:07 yongliang-wu

Thanks for your interests!

Demo:https://github.com/Luodian/Otter/blob/main/pipeline/demo/otter_image.py

Jul 01 '23 14:07 ZhangYuanhan-AI

Hi, I am also curious about the setting in the evaluation. Openflamingo randomly selects images as the prompt, and the score of cider is 65.5/74.3/79.3/81.8 (0/4/8/16-shot), but the score of openflamingo in the paper is 60.8/72.4/79.3/81.8 (0/4/8/16-shot). I would like to know why it is different

Jul 01 '23 16:07 GaryJiajia

The openflamingo's performance in our paper is evaluated on our side and the 0-shot performance is little different.

For few-shot evaluation of OpenFlamingo and Otter, I refer the following code. You can try to run their code on Openflamingo and Otter to see the performance. I once tested our models on their evaluation code and record the openflamingo's value.

For more details (prompts and generation params) on Otter's evaluation on public datasets, we actually integrate the evaluation code into MMAGIBench's code and they are said they will release it soon (within one week).

Jul 02 '23 00:07 Luodian

Thanks for the response. I use the eval code provided by open-flamingo and replace the model with Otter and it works well for me. However, the results achieves 83.1 CIDEr (4-shot) which is better than the result provided in the paper. And when I change the prompt to "<Image>User: ... GPT: ..." which is adoted in the demo, it drops to only 48. I don't know why.

Jul 02 '23 13:07 yongliang-wu

🤨It's tricky for prompt engineering. We didn't do much study in this part.

Jul 02 '23 13:07 Luodian

When evaluating, do you utilize the prompt same as open-flamingo ("<image>outputs:")?

Jul 02 '23 13:07 yongliang-wu

Otter Otter copied to clipboard

Release the evaluation scripts

Otter
Otter copied to clipboard