LLaMA-VID Incomplete evaluation on MSVD-QA dataset.

Incomplete evaluation on MSVD-QA dataset.

Open XenonLamb opened this issue 1 year ago • 5 comments

Hi! I'm trying to reproduce the video evaluation results for llama-vid-7b-full-224-video-fps-1, but after running the provided scripts with the official checkpoint and MSVD-QA, not all of the files are predicted: 20240109-175304 What could be the cause of it?

Jan 10 '24 01:01 XenonLamb

To provide some context, here is the result file I obtained after running the evaluation script: results (2).json

Jan 10 '24 01:01 XenonLamb

Hi, this happens when GPT doesn't give feedback. It may caused by network issues or GPT response issues. You can first check the network and ensure GPT works well, and then run the evaluation script (L27-L34 at here) again. It will continue the incomplete files.

Jan 11 '24 13:01 yanwei-li

Thank you! May I ask which api_base did you use for evaluation? I found GPT's behavior seems different for gpt-3.5-turbo on my api base, which caused about 7% difference in accuracy

Jan 11 '24 20:01 XenonLamb

Hi, we use the bought api base. We tested several times and did not find such a huge gap. Are other packages kept the same, like transformers?

Jan 23 '24 14:01 yanwei-li

Hi, we use the bought api base. We tested several times and did not find such a huge gap. Are other packages kept the same, like transformers?

yes, the other packages are the same. the accuracies of (a) The results.json you provided in another issue (b) The results predicted from provided checkpoints and (c) results predicted from re-implemented model are very close.

Feb 13 '24 00:02 XenonLamb

LLaMA-VID LLaMA-VID copied to clipboard

Incomplete evaluation on MSVD-QA dataset.

LLaMA-VID
LLaMA-VID copied to clipboard