VBench icon indicating copy to clipboard operation
VBench copied to clipboard

Cogvideo2B score

Open CacacaLalala opened this issue 1 year ago • 8 comments

Hi, I see that the total score of cogvideo2B on Leaderboard is 80.94%, but after I use all_dimension_long. txt to inference, the total score measured is only 78.68%. The video I produced with cogvideo2B was 8 frame rate, 6s length, and 480x720 resolution. May I ask why my test result is so much lower than that on the leaderboard? Looking forward to your reply,thanks a lot

CacacaLalala avatar Aug 23 '24 07:08 CacacaLalala

Could you provide the details of the model checkpoint and sampling setting?

ziqihuangg avatar Aug 23 '24 08:08 ziqihuangg

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

CacacaLalala avatar Aug 23 '24 09:08 CacacaLalala

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

Hello, here are our settings for sampling CogVideoX-2B (last line): https://github.com/Vchitect/VBench/tree/master/sampled_videos#what-are-the-details-of-the-video-generation-models. We use the SAT weights to sample videos for evaluation.

DZY-irene avatar Aug 23 '24 09:08 DZY-irene

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

And for evaluation, we use the VBench-long code to evaluate sampled videos.

DZY-irene avatar Aug 23 '24 13:08 DZY-irene

Hi, I also tried the sat weights to sample videos, and got a new result 79.75% which is still much lower than the report result. For evaluation, I still use the old evaluation code, could these cause problem? Since the only difference between using the longer txt and the original is the length of the prompt

CacacaLalala avatar Aug 26 '24 08:08 CacacaLalala

What prompt list did you use?

ziqihuangg avatar Aug 26 '24 08:08 ziqihuangg

https://github.com/Vchitect/VBench/blob/master/prompts/gpt_enhanced_prompts/all_dimension_longer.txt this one

CacacaLalala avatar Aug 26 '24 09:08 CacacaLalala

Hi, we've released the following details for evaluating CogVideoX-2B:

  • videos used for evaluation: https://drive.google.com/file/d/1zuQ47Uvze4157o4YMta0Zqz9G8TdHcXZ/view?usp=share_link
  • code used for evaluation: https://github.com/Vchitect/VBench/tree/master/vbench2_beta_long

Feel free to let us know if you find any discrepancy. Thanks!

ziqihuangg avatar Dec 29 '24 07:12 ziqihuangg

Closing the issue for now. Feel free to reopen if you feel it's not solved.

ziqihuangg avatar May 06 '25 07:05 ziqihuangg