Different text prompts generate almost the same video generation result.

Open Wangyupei opened this issue 1 year ago • 1 comments

Thanks for your great work. However, in our experiment, we tried different text prompts according to your instruction (the dataset preparation and inference code), the video generation results are almost the same. Is there anything wrong?

Sep 05 '24 07:09 Wangyupei

Hi，sorry for the lat reply. The current code use the GT image as conditional frame and generate the subsequent video frames for inference, so modifying the text prompt cannot modify the textual attributes well because the subsequent video frames are highly correlated with the conditional frame.

Sep 14 '24 08:09 wenyuqing