CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

SAT sampling results are worse than Diffusers sampling results

Open xvjiarui opened this issue 5 months ago • 1 comments

System Info / 系統信息

Hi CogVideo Team,

First of all, thank you so much for open-sourcing such great models for community to research text-video generative models.

I tried out both Diffusers and SAT codebase, and I found out the sampling results from SAT are much worse than Diffusers. Here is some example:

  1. Prompt: "A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. the scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field."

Diffusers:

https://github.com/user-attachments/assets/7f6db0ee-db6a-4f28-9dd0-288f41d61a43

SAT:

https://github.com/user-attachments/assets/4526c32d-9a65-4c50-ae86-9aace94cb4a2

  1. Prompt: "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about."

Diffusers:

https://github.com/user-attachments/assets/46788cfd-7986-4d26-b7ae-5b06929b98ed

SAT:

https://github.com/user-attachments/assets/561e52e1-b1e5-49e4-8115-7cda950d9a3b

It would be very kind of authors to look into this issue. It will help the research community to build exciting projects upon CogVideoX. Truly appreciate your help on this issue. Looking forward to your reply.

Best, Jiarui

Information / 问题信息

  • [X] The official example scripts / 官方的示例脚本
  • [ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Run provided inference files Diffusers: https://github.com/THUDM/CogVideo/blob/main/inference/cli_demo.py

SAT: https://github.com/THUDM/CogVideo/blob/main/sat/inference.sh

Expected behavior / 期待表现

The SAT results are expected to be at the same level quality as Diffusers.

xvjiarui avatar Aug 29 '24 05:08 xvjiarui