Palette-Image-to-Image-Diffusion-Models icon indicating copy to clipboard operation
Palette-Image-to-Image-Diffusion-Models copied to clipboard

Conditioning on y_cond

Open PouriaRouzrokh opened this issue 2 years ago • 2 comments

Hi, thanks for the awesome codes :)

One question for the inpainting task:

Looking at the following snippet from your code in networks.py, I cannot understand why you are conditioning your model on y_cond if you are already modifying your y_noisy based on the y_0 image using the expression "y_noisy*mask+(1.-mask)*y_0"?

Shouldn't concatenating with y_cond be redundant in this case? Your model is already seeing the ground truth parts of the image in the modified version of the y_noisy.

    def forward(self, y_0, y_cond=None, mask=None, noise=None):
        # sampling from p(gammas)
        b, *_ = y_0.shape
        t = torch.randint(1, self.num_timesteps, (b,), device=y_0.device).long()
        gamma_t1 = extract(self.gammas, t-1, x_shape=(1, 1))
        sqrt_gamma_t2 = extract(self.gammas, t, x_shape=(1, 1))
        sample_gammas = (sqrt_gamma_t2-gamma_t1) * torch.rand((b, 1), device=y_0.device) + gamma_t1
        sample_gammas = sample_gammas.view(b, -1)

        noise = default(noise, lambda: torch.randn_like(y_0))
        y_noisy = self.q_sample(
            y_0=y_0, sample_gammas=sample_gammas.view(-1, 1, 1, 1), noise=noise)

        if mask is not None:
            noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy*mask+(1.-mask)*y_0], dim=1), sample_gammas)
            loss = self.loss_fn(mask*noise, mask*noise_hat)
        else:
            noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy], dim=1), sample_gammas)
            loss = self.loss_fn(noise, noise_hat)
        return loss

PouriaRouzrokh avatar Jul 21 '22 15:07 PouriaRouzrokh

Hi, thanks for this great question, and I think there are two potential considerations for this:

  1. Keep the consistency between training and inference over all tasks. The model samples from random noise and y_cond in the inference stage.
  2. y_cond can distinguish between the mask and unmasked areas since y_t may not be straightforward enough when t is small.

Janspiry avatar Jul 22 '22 09:07 Janspiry

Hi, thanks for this great question, and I think there are two potential considerations for this:

  1. Keep the consistency between training and inference over all tasks. The model samples from random noise and y_cond in the inference stage.
  2. y_cond can distinguish between the mask and unmasked areas since y_t may not be straightforward enough when t is small.

Thanks for the kind reply. This makes sense, though it is worth trying the second reason. I will post here if I realized something different.

PouriaRouzrokh avatar Jul 25 '22 18:07 PouriaRouzrokh

Feel free to reopen the issue if there is any question.

Janspiry avatar Aug 25 '22 06:08 Janspiry

@PouriaRouzrokh I have opened a separate issue on it. But I am in urgent need of a solution, so I just wanted to check with you .

In my inpainting case, during the inference only the y_cond and mask images are given. In that case, may I know how to do a inference?

In the network.py script, for the inpainting task the below line will be executed as part of the restoration function. As y_0 is None for me, I am not sure how to deal with this line. If I skip the below line then the results are very bad (just only some whitish kind of image is generated). Also, in the Process.png image I can notice that for each step the noise level is increasing rather than decreasing.

if mask is not None:
    y_t = y_0*(1.-mask) + mask*y_t

Any idea on how to proceed?

vinodrajendran001 avatar Nov 14 '22 09:11 vinodrajendran001

@Janspiry Why not just set the mask as y_cond? for consistency among all tasks?

yc-cui avatar Jan 26 '23 04:01 yc-cui