Otter
Otter copied to clipboard
Release the evaluation scripts
Hello, I'm trying to test Otter on MSCOCO dataset, would you please release the evaluation codes? Thanks for your help!
Thanks for your interests!
Demo:https://github.com/Luodian/Otter/blob/main/pipeline/demo/otter_image.py
Hi, I am also curious about the setting in the evaluation. Openflamingo randomly selects images as the prompt, and the score of cider is 65.5/74.3/79.3/81.8 (0/4/8/16-shot), but the score of openflamingo in the paper is 60.8/72.4/79.3/81.8 (0/4/8/16-shot). I would like to know why it is different
The openflamingo's performance in our paper is evaluated on our side and the 0-shot performance is little different.
For few-shot evaluation of OpenFlamingo and Otter, I refer the following code. You can try to run their code on Openflamingo and Otter to see the performance. I once tested our models on their evaluation code and record the openflamingo's value.
For more details (prompts and generation params) on Otter's evaluation on public datasets, we actually integrate the evaluation code into MMAGIBench's code and they are said they will release it soon (within one week).
Thanks for the response. I use the eval code provided by open-flamingo and replace the model with Otter and it works well for me. However, the results achieves 83.1 CIDEr (4-shot) which is better than the result provided in the paper. And when I change the prompt to "<Image>User: ... GPT: ..." which is adoted in the demo, it drops to only 48. I don't know why.
🤨It's tricky for prompt engineering. We didn't do much study in this part.
When evaluating, do you utilize the prompt same as open-flamingo ("<image>outputs:")?