denoising-diffusion-pytorch
denoising-diffusion-pytorch copied to clipboard
clip_denoised purpose?
Hello, I was wondering why we have to clamp the output of x_start from predict_start_from_noise( ) when sampling:
def p_mean_variance(self, x, t, clip_denoised: bool):
model_output = self.denoise_fn(x, t)
if self.objective == 'pred_noise':
x_start = self.predict_start_from_noise(x, t = t, noise = model_output) # i don't understand this
elif self.objective == 'pred_x0':
x_start = model_output
else:
raise ValueError(f'unknown objective {self.objective}')
if clip_denoised:
x_start.clamp_(-1., 1.)
model_mean, posterior_variance, posterior_log_variance = self.q_posterior(x_start = x_start, x_t = x, t = t)
return model_mean, posterior_variance, posterior_log_variance
I did not see any mention of imposing this constraint anywhere in the original DDPM paper, and in this other Google Colab implementation , they do not implement any sort of clamping during the sampling process. Is there a reason why this was done in this code?
I believe it is because this implementation converts the output of the model back to images in the sampling function: https://github.com/lucidrains/denoising-diffusion-pytorch/blob/37334ae82467197ba3df194cae3a85332f5736be/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py#L535
So, to ensure that the final image is between 0 and 1, you must have that the output from the model (i.e., the denoised image) is between -1 and 1 (which explains why they clip the outputs)
@xenova But that should be a final step, not done on every iteration.
The problem I'm experiencing is that the reverse process doesn't converge toward the mean/variance of the source data. Samples are oversaturated. Removing clipping makes this even worse. I suspect that the clipping is a band-aid for another problem.
@xenova But that should be a final step, not done on every iteration.
The problem I'm experiencing is that the reverse process doesn't converge toward the mean/variance of the source data. Samples are oversaturated. Removing clipping makes this even worse. I suspect that the clipping is a band-aid for another problem.
Out of curiosity, what beta/noise scheduler are you using? I've had issues in the past with the cosine beta scheduler for some reason.
@xenova But that should be a final step, not done on every iteration. The problem I'm experiencing is that the reverse process doesn't converge toward the mean/variance of the source data. Samples are oversaturated. Removing clipping makes this even worse. I suspect that the clipping is a band-aid for another problem.
Out of curiosity, what beta/noise scheduler are you using? I've had issues in the past with the cosine beta scheduler for some reason.
I'm not sure about @xenova, but I was using the cosine scheduler and was running into issues with the mean/variance blowing up in the output. In a related issue, another user found that training models with the pred_x0 objective instead of the default pred_noise objective was actually leading to better samples, and I also found that when clip_denoised was set to False, the end output became oversaturated and exploded away from the -1 to 1 range. Another subtlety to note is that the cosine schedule is not necessarily the best beta schedule to use and it may not actually be the best fit with the simple pred_noise training objective/loss function as used in this code repository:
(Taken from Nichol and Dhariwal 2021)
@malekinho8
Hi, do we have the answer of the problem? I am confused of clip_denoised purpose and the range of it. I noticed that in improved DDPM, they scaled the image intensity value to -1 and 1. I'm not sure if this is relevant to the [-1,1] range set in clip_dennoised function.
Looking forward to your reply.
@malekinho8 Hi, do we have the answer of the problem? I am confused of clip_denoised purpose and the range of it. I noticed that in improved DDPM, they scaled the image intensity value to -1 and 1. I'm not sure if this is relevant to the [-1,1] range set in clip_dennoised function.
Looking forward to your reply.
I also found this problem, so is it related to the range of data? If the data range is 0-1, does the clamp range here also need to be changed accordingly?
@yanglibo0512 the data is normalized to -1 to 1 here
@yanglibo0512 the data is normalized to -1 to 1 here
I see, I want to know whether the range of clamp here corresponds to the range of data?