MedSegDiff
MedSegDiff copied to clipboard
loss problem
(1) mse_diff here I understand is to predict the noise, target (noisy added) shape=[b,1,h,w], but model_output shape=[b,2,h,w], last issue you answer here two channels represent the mean and variance, can you explain the significance of them doing mse?
(2) loss_cal where target is the segmentation GT, does that model cal output represent the predicted segmentation result? Can the cal output be used directly to represent the segmentation accuracy of the model in the inference stage?
(3)Can you explain the meaning of sample, x_noisy, org, cal, cal_out respectively?
Question 1: During training stage, does the [cal] of output mean the segmentation map? I see that it is obtained by an additional segmentation head which takes the original image and representations from different layers of cat(img, mask), following with a sigmoid.
Question 2: During training state, the [model_output] is the learned noise of diffusion model, in my understanding, it should be [1,1, H,W]. However, i got [1,2,H,W]. The code is
I think the dimension of 2 in the above code is the num_class we give. So what does the dimension 2 in the model_output mean? since i see an answer you give is that it refers mean and variance. I can not understand it, and I am also confusing about the loss of mes_diff since the different dimension.
Question3: you have mse_diff, and a loss_cal with respect to the mask. Intuitively, during reverse inference stage, is i understand correctly, you would have two way to get the segmentation, the cal is the segmentation, another is the segmentation recovered by diffusion model. Did you compare these two kinds of results?
Question 3: i want to confirm that the target and model_output in your code are noise, right? rather than the x_{t-1}?
我的理解是,方差可学习,输出就是[1,2,H,W],否则就是[1,1,H,W].不知道对不对
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
My understanding is that the variance can be learned, and the output is [1,2,H,W], otherwise it is [1,1,H,W]. I don’t know if it’s right