VBench Cogvideo2B score

Hi, I see that the total score of cogvideo2B on Leaderboard is 80.94%, but after I use all_dimension_long. txt to inference, the total score measured is only 78.68%. The video I produced with cogvideo2B was 8 frame rate, 6s length, and 480x720 resolution. May I ask why my test result is so much lower than that on the leaderboard? Looking forward to your reply，thanks a lot

Aug 23 '24 07:08 CacacaLalala

Could you provide the details of the model checkpoint and sampling setting?

Aug 23 '24 08:08 ziqihuangg

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

Aug 23 '24 09:08 CacacaLalala

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

Hello, here are our settings for sampling CogVideoX-2B (last line): https://github.com/Vchitect/VBench/tree/master/sampled_videos#what-are-the-details-of-the-video-generation-models. We use the SAT weights to sample videos for evaluation.

Aug 23 '24 09:08 DZY-irene

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

And for evaluation, we use the VBench-long code to evaluate sampled videos.

Aug 23 '24 13:08 DZY-irene

Hi, I also tried the sat weights to sample videos, and got a new result 79.75% which is still much lower than the report result. For evaluation, I still use the old evaluation code, could these cause problem? Since the only difference between using the longer txt and the original is the length of the prompt

Aug 26 '24 08:08 CacacaLalala

What prompt list did you use?

Aug 26 '24 08:08 ziqihuangg

https://github.com/Vchitect/VBench/blob/master/prompts/gpt_enhanced_prompts/all_dimension_longer.txt this one

Aug 26 '24 09:08 CacacaLalala

Hi, we've released the following details for evaluating CogVideoX-2B:

videos used for evaluation: https://drive.google.com/file/d/1zuQ47Uvze4157o4YMta0Zqz9G8TdHcXZ/view?usp=share_link
code used for evaluation: https://github.com/Vchitect/VBench/tree/master/vbench2_beta_long

Feel free to let us know if you find any discrepancy. Thanks!

Dec 29 '24 07:12 ziqihuangg

Closing the issue for now. Feel free to reopen if you feel it's not solved.

May 06 '25 07:05 ziqihuangg