TSPO reproduction question
Hi, authors, thank you for your great work!
I tried to reproduce your results by training TSPO following the training scheme and data described in the repository. My training platform is a machine with 8 NVIDIA L40S GPUs. However, my reproduced model performs much worse than the reported accuracy in your paper.
The TSPO-0.4B model I trained with LLaVA-Video-Qwen2-7B only achieves 0.5617 accuracy on the LongVideoBench validation set, which is even lower than the uniform sampling baseline with LLaVA-Video-Qwen2-7B.
I would like to ask if TSPO is generally stable under your training setup, or have you observed large variance across different random seeds?
Any hints or additional guidance on reproducing your results would be greatly appreciated. Thanks again for your work!
Thank you for your interest in our work. Regarding your question, we do encounter instances where multiple training leads to slight gap between the test results, but we have never encountered such a significant discrepancy as you have reproduced. Have you conducted the following experiments:
- Direct testing using the original LLaVA-Video-Qwen2-7B
- Testing using our provided TSPO-0.4B
Thank you so much for your reply. Testing both the original LLaVA-Video-Qwen2-7B and your TSPO-0.4B works as expected. I’ll try to reproduce the results again.
Thank you for your attention!You may check the following:
- Has each piece of data been tested successfully?
- Is the correct checkpoint loaded?
- You can run demo.py to debug and troubleshoot.