denoising-diffusion-pytorch clip

Hello, I was wondering why we have to clamp the output of x_start from predict_start_from_noise( ) when sampling:

def p_mean_variance(self, x, t, clip_denoised: bool):

        model_output = self.denoise_fn(x, t)

        if self.objective == 'pred_noise':
            x_start = self.predict_start_from_noise(x, t = t, noise = model_output) # i don't understand this
        elif self.objective == 'pred_x0':
            x_start = model_output
        else:
            raise ValueError(f'unknown objective {self.objective}')

        if clip_denoised:
            x_start.clamp_(-1., 1.)

        model_mean, posterior_variance, posterior_log_variance = self.q_posterior(x_start = x_start, x_t = x, t = t)
        return model_mean, posterior_variance, posterior_log_variance

I did not see any mention of imposing this constraint anywhere in the original DDPM paper, and in this other Google Colab implementation , they do not implement any sort of clamping during the sampling process. Is there a reason why this was done in this code?

Jul 16 '22 15:07 malekinho8

I believe it is because this implementation converts the output of the model back to images in the sampling function: https://github.com/lucidrains/denoising-diffusion-pytorch/blob/37334ae82467197ba3df194cae3a85332f5736be/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py#L535

So, to ensure that the final image is between 0 and 1, you must have that the output from the model (i.e., the denoised image) is between -1 and 1 (which explains why they clip the outputs)

Jul 24 '22 14:07 xenova

@xenova But that should be a final step, not done on every iteration.

The problem I'm experiencing is that the reverse process doesn't converge toward the mean/variance of the source data. Samples are oversaturated. Removing clipping makes this even worse. I suspect that the clipping is a band-aid for another problem.

Jul 31 '22 08:07 almson

@xenova But that should be a final step, not done on every iteration.

The problem I'm experiencing is that the reverse process doesn't converge toward the mean/variance of the source data. Samples are oversaturated. Removing clipping makes this even worse. I suspect that the clipping is a band-aid for another problem.

Out of curiosity, what beta/noise scheduler are you using? I've had issues in the past with the cosine beta scheduler for some reason.

Aug 01 '22 08:08 xenova

@xenova But that should be a final step, not done on every iteration. The problem I'm experiencing is that the reverse process doesn't converge toward the mean/variance of the source data. Samples are oversaturated. Removing clipping makes this even worse. I suspect that the clipping is a band-aid for another problem.

Out of curiosity, what beta/noise scheduler are you using? I've had issues in the past with the cosine beta scheduler for some reason.

I'm not sure about @xenova, but I was using the cosine scheduler and was running into issues with the mean/variance blowing up in the output. In a related issue, another user found that training models with the pred_x0 objective instead of the default pred_noise objective was actually leading to better samples, and I also found that when clip_denoised was set to False, the end output became oversaturated and exploded away from the -1 to 1 range. Another subtlety to note is that the cosine schedule is not necessarily the best beta schedule to use and it may not actually be the best fit with the simple pred_noise training objective/loss function as used in this code repository:

Screenshot from 2022-08-01 07-59-35

(Taken from Nichol and Dhariwal 2021)

Aug 01 '22 13:08 malekinho8

@malekinho8
Hi, do we have the answer of the problem? I am confused of clip_denoised purpose and the range of it. I noticed that in improved DDPM, they scaled the image intensity value to -1 and 1. I'm not sure if this is relevant to the [-1,1] range set in clip_dennoised function.

Looking forward to your reply.

May 25 '23 12:05 GYDDHPY

@malekinho8 Hi, do we have the answer of the problem? I am confused of clip_denoised purpose and the range of it. I noticed that in improved DDPM, they scaled the image intensity value to -1 and 1. I'm not sure if this is relevant to the [-1,1] range set in clip_dennoised function.

Looking forward to your reply.

I also found this problem, so is it related to the range of data? If the data range is 0-1, does the clamp range here also need to be changed accordingly?

Dec 14 '23 15:12 yanglibo0512

@yanglibo0512 the data is normalized to -1 to 1 here

Dec 14 '23 16:12 lucidrains

@yanglibo0512 the data is normalized to -1 to 1 here

I see, I want to know whether the range of clamp here corresponds to the range of data?

Dec 15 '23 00:12 yanglibo0512

denoising-diffusion-pytorch
denoising-diffusion-pytorch copied to clipboard

clip_denoised purpose?

denoising-diffusion-pytorch denoising-diffusion-pytorch copied to clipboard

clip_denoised purpose?

denoising-diffusion-pytorch
denoising-diffusion-pytorch copied to clipboard