About noise difference in video & image generation

Open XuWuLingYu opened this issue 10 months ago • 2 comments

In image generation, different views use the same noise as validation initial. torch.stack([latents]*6,dim=0) But in the video generation, it seems not work to do the same way. The nosie between views must be different. Is there any research can tell why?

Feb 19 '25 08:02 XuWuLingYu

We did not strictly verify these options.

We did try training image generation by adding the same noise to different views and it does not work.

I think the expected operation may be use different noise to each single view, no matter multi-view or video generation. Thus, it may not be necessary to use the same initial noise for image generation.

Feb 19 '25 11:02 flymin

This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.

Feb 27 '25 16:02 github-actions[bot]