About noise difference in video & image generation
In image generation, different views use the same noise as validation initial.
torch.stack([latents]*6,dim=0)
But in the video generation, it seems not work to do the same way. The nosie between views must be different. Is there any research can tell why?
We did not strictly verify these options.
We did try training image generation by adding the same noise to different views and it does not work.
I think the expected operation may be use different noise to each single view, no matter multi-view or video generation. Thus, it may not be necessary to use the same initial noise for image generation.
This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.