SEED
SEED copied to clipboard
Reproduce SEED LLaMA evaluation
Thanks for your great work.
I have a question related to SEED-LLAMA evaluation settings. I tried to reproduce the VQA accuracy of instruction tuned SEED-LLaMA 8B on VQAv2 dataset but i cannot reproduce results in paper (66.2).
I tried on 8x A100 80GB gpu and 1 batch size. This is the generation config i used.
generation_config = {
'temperature': 1.0,
'num_beams': 1,
'max_new_tokens': 64,
'top_p': 0.5,
'do_sample': True
}
And this is the result calculated by official evaluation website. "test-dev": {"yes/no": 38.59, "number": 23.68, "other": 39.1, "overall": 37.14}
It would be thankful if you can provide your evaluation settings or some advice.