LLaMA-VID
LLaMA-VID copied to clipboard
Incomplete evaluation on MSVD-QA dataset.
Hi! I'm trying to reproduce the video evaluation results for llama-vid-7b-full-224-video-fps-1, but after running the provided scripts with the official checkpoint and MSVD-QA, not all of the files are predicted:
What could be the cause of it?
To provide some context, here is the result file I obtained after running the evaluation script: results (2).json
Hi, this happens when GPT doesn't give feedback. It may caused by network issues or GPT response issues. You can first check the network and ensure GPT works well, and then run the evaluation script (L27-L34 at here) again. It will continue the incomplete files.
Thank you! May I ask which api_base did you use for evaluation? I found GPT's behavior seems different for gpt-3.5-turbo on my api base, which caused about 7% difference in accuracy
Hi, we use the bought api base. We tested several times and did not find such a huge gap. Are other packages kept the same, like transformers?
Hi, we use the bought api base. We tested several times and did not find such a huge gap. Are other packages kept the same, like transformers?
yes, the other packages are the same. the accuracies of (a) The results.json you provided in another issue (b) The results predicted from provided checkpoints and (c) results predicted from re-implemented model are very close.