QiZuo
QiZuo
when I try to figure out how to adapt the framework for text2video synthesis, I found that the SpatialTemporalUNet has a input channel 8 which is depicted in this line:...
Hello,can you release the code that project back the pixels of renders and extract the 4K points?
``` def sample_loss(self, x0, noise=None): if noise is None: noise = torch.randn_like(x0) if self.noise_strength > 0: b, c, f, _, _= x0.shape offset_noise = torch.randn(b, c, f, 1, 1, device=x0.device)...