DiffusionDet
DiffusionDet copied to clipboard
About training loss
In DDIM or DDPM, there are losses (KL-Divergence) to constrain the diffused outputs during training steps to be Gaussian distributions. I thought it is the base for DDIM sampling (the reverse process). However, in DiffusionDet, only set prediction loss is used. So how can DDIM work without training constrain?
Hi,
The set prediction loss contains one item $\mathcal{L}_{L1}$ defined here, which measures the mean absolute error (L1 distance) between each element in the ground truth boxes and predicted boxes.
As presented in Algorithm 1 Training of DDPM(https://arxiv.org/pdf/2006.11239v2.pdf), in step 5, a gradient descent step is adopted to constrain the diffused output to be Gaussian. Does DiffusionDet need such kind of loss to contrain the corrupt bboxes to be Gaussian?
Hi @ShoufaChen @huilicici , firstly, thanks to the authors for their good work.
Actually, I have the same confusion. DDIM conducts MSE loss between Gaussian noise and the output of the denoiser (U-Net) during the training stage. However, in DiffusionDet, it seems that the denoiser (cascade decoder) is directly optimized to refine the noisy box to obtain ground truth boxes, which works very differently from the conventional DDIM.
I am not sure whether it can be seen as introducing a denoising task like DN-Detr. Based on this understanding, the sampling steps in the inference stage also should not have an observable influence on the detection performance.
Also have the same confusion. Waiting for the answer
I have same question as @gugite Can you please respond to this? @ShoufaChen It will be very helpful.