ResNetFusion
ResNetFusion copied to clipboard
Hello, your code helped me a lot, but I still have doubts about TV_loss, Image_loss and LaplacianLoss.
Hi, I understand that this code is about the paper "Infrared and visible image fusion via detail preserving adversarial learning", but about the loss function part loss.py
, I have studied carefully for a long time and still have questions, I hope you can help me answer them, thanks a lot.
-
tv_loss
inloss.py
in your code, I understand that this is a loss function for image denoising, but yourclass TVLoss(nn.Module)
is different from the general TVloss in the return value, the general TVLoss returnsself.TVLoss_weight*2*(h_tv/ count_h + w_tv/count_w)/batch_size
, but what you return here isself.tv_loss_weight * 2 * (h_tv[:, :, :h_x - 1, :w_x - 1] + w_tv[:, :, :h_x - 1, :w_x - 1])
, what does this do? and istv_loss
reflected in the original text? Because I didn't see tv_loss in the original text. - In your
class LaplacianLoss(nn.Module)
, where the last return value isself.laplacian_filter(x) ** 2
, the paper about Target edge-enhancement loss, written to calculate the gradient, but why you return the square of the gradient value here? - In the paper of Target edge-enhancement loss, about the edge enhancement loss of G(x,y), the code is
coefficient = pyramid_addition * alpha/2 + 1
, my understanding:pyramid_addition
is G(x,y), But why multiply it byalpha/2
and+ 1
?
I hope you can help me with these questions in your free time. Thank you and I wish you all the best! Yours sincerely Yannik Yang.
- I think it is a computational trick where 'h_tv' and 'w_tv' will keep the same shape.
- I am not sure whether the negative values are valid for Gaussian blur or not. Since the gradient map corresponds to the image edge map, I think it might be better when all values are positive.
pyramid_addition * alpha/2 + 1
3. There is an implicit broadcast operator. However, I think that pyramid_addition * alpha + 1
is enough
- I think it is a computational trick where 'h_tv' and 'w_tv' will keep the same shape.
thank you so much for your reply, I'm full of gratitude, and especially, I want to consult you about the tv_loss
again, because there is no tv_loss
in the anthor's paper. but in this code of his paper, tv_loss
occured. I still don't understand the tv_loss in the code. Just now, you said it could be designed to keep the same shape, but in the code, h_tv[:, :, :h_x - 1, :w_x - 1]
and w_tv[:, :, :h_x - 1, :w_x - 1]
is already the same shape. and tv_loss = self.mse_loss(self.tv_loss(out_images) , (self.tv_loss(target_images) + self.tv_loss(target_ir)))
, what the shape they want to keep? I still don't know the effect in the code and if it is redundant about tv_loss in the code? thank you again for your reply, I hope you all the best.
yours sincerely Yannik.
tv_loss = self.mse_loss(self.tv_loss(out_images) , (self.tv_loss(target_images) + self.tv_loss(target_ir)))
where self.tv_loss
is to compute the sum of the x- and y-axis image gradients.
tv_loss
is designed to keep the gradients of fused image consistent with the gradients of ir and vis images. I think that the tv_loss
was called as L_{gradient} in the paper.
tv_loss = self.mse_loss(self.tv_loss(out_images) , (self.tv_loss(target_images) + self.tv_loss(target_ir)))
whereself.tv_loss
is to compute the sum of the x- and y-axis image gradients.tv_loss
is designed to keep the gradients of fused image consistent with the gradients of ir and vis images. I think that thetv_loss
was called as L_{gradient} in the paper.
Thank you very much for your answer, I seem to understand part of it, but I found that there is no gradient operator in the calculation of tv_loss
, This code is used to compute the difference between the intensity of adjacent pixels in the x- and y- directions of the image, and I can only see the intensity information calculation that reflects the contrast of the image, and there is no gradient calculation that reflects the texture details. Even the assumption of gradient calculation, according to the original text L{gradient} = (Dv-Df)^2
, then the code should be tv_loss = self.mse_loss(self.tv_loss(out_images) , self.tv_loss(target_images))
, why it keep the fused image consistent with the sum of ir and vis images about difference between adjacent pixels?
I sincerely thank you for answering my question, which has bothered me for a long time.
h_tv = torch.pow((x[:, :, 1:, :] - x[:, :, :h_x - 1, :]), 2) ---> h_tv (B×C×H-1×W)
w_tv = torch.pow((x[:, :, :, 1:] - x[:, :, :, :w_x - 1]), 2) ---> w_tv (B×C×H×W-1)
image gradient
tv_loss = self.mse_loss(self.tv_loss(out_images) , self.tv_loss(target_images))
I agree with you.
h_tv = torch.pow((x[:, :, 1:, :] - x[:, :, :h_x - 1, :]), 2) ---> h_tv (B×C×H-1×W) w_tv = torch.pow((x[:, :, :, 1:] - x[:, :, :, :w_x - 1]), 2) ---> w_tv (B×C×H×W-1)
image gradient
Thanks for your answer so much, until today I finally figured out what tv_loss means. But is the square of the intensity difference of adjacent pixels the gradient? and why is the gradient generally calculated by the gradient operator convolution kernel instead of the square of the difference of adjacent pixels?