Diff-UNet icon indicating copy to clipboard operation
Diff-UNet copied to clipboard

About The Testing

Open YonghanLU opened this issue 1 year ago • 8 comments

From your figure 1 in the paper, we know your method directly predict x_0 in training, but why you inference the result step by step in testing? I do not understand.

YonghanLU avatar Apr 14 '23 02:04 YonghanLU

You can learn how the diffusion model works. It needs one step in training and multiple steps in testing.

920232796 avatar Apr 14 '23 12:04 920232796

I think I know diffusion a little, but I still do not understand , If you predict x_0 directly during training , how do you perform multi-step reasoning in testing ? Where are the parameters for Gaussian noise ? Did you still predict the noise, but did not start with independent Gaussian noise, and predicted the segmentation map from step t(then , predict x_0 step by step)?

YonghanLU avatar Apr 14 '23 14:04 YonghanLU

In this task, Diff-UNet starts with independent Gaussian noise, but predicts segmentation map(x_0) instead of noise. You can see the DDIM update formula in this repo.

920232796 avatar Apr 14 '23 15:04 920232796

Why predict x_0 directlty ? Reducing memory consumption to accommdate 3D data?

YonghanLU avatar Apr 15 '23 08:04 YonghanLU

I am also confused by this part. And I don‘t think it should be considered a standard setting for diffusion models.

Why predict x_0 directlty ? Reducing memory consumption to accommdate 3D data?

Fivethousand5k avatar May 14 '23 09:05 Fivethousand5k

Based on my experiments, during the inference stage, the trained DiffUnets could yield satisfying results during the beginning steps (even the first step). However, such a level of noise could only occur when t is relatively small (beginning steps of forward diffusion: x0->x1->x2...) during the training stage. It means there is a gap of noise level between training and inference.

Moreover, since DiffUnet is always optimized towards x0 rather than noise, I am not sure whether it could still be considered a diffusion model. Maybe It is more appropriate to categorize it as a kind of recurrent models?

Anyway, I have no offense to your work and just wanna share some of my ideas ^-^, and wish you good luck on your submissions.

Fivethousand5k avatar May 14 '23 10:05 Fivethousand5k

I am also confused by this part. And I don‘t think it should be considered a standard setting for diffusion models.

Why predict x_0 directlty ? Reducing memory consumption to accommdate 3D data?

You can see this article from Hinton. A Generalist Framework for Panoptic Segmentation of Images and Videos.

I think the target of segmentation task is simple (the model only predicts 0, 1, 2...., not continuous data), so it can predict x_0 directly.

920232796 avatar May 15 '23 03:05 920232796

Based on my experiments, during the inference stage, the trained DiffUnets could yield satisfying results during the beginning steps (even the first step). However, such a level of noise could only occur when t is relatively small (beginning steps of forward diffusion: x0->x1->x2...) during the training stage. It means there is a gap of noise level between training and inference.

Moreover, since DiffUnet is always optimized towards x0 rather than noise, I am not sure whether it could still be considered a diffusion model. Maybe It is more appropriate to categorize it as a kind of recurrent models?

Anyway, I have no offense to your work and just wanna share some of my ideas ^-^, and wish you good luck on your submissions.

My work is also based on other excellent work, we can discuss more about diffusion model if you are interested in this section.

920232796 avatar May 15 '23 03:05 920232796