DisAlign icon indicating copy to clipboard operation
DisAlign copied to clipboard

the code question in semantic_seg

Open Ianresearch opened this issue 4 years ago • 8 comments

Hi, I have a questation about the logit_scale and logit_bias in semantic_seg. The shape of the above parameter is (1, num_classes, 1, 1), why not is (1, num_classes, 512, 512) which is matched the input image size for semantic segmenation.

Ianresearch avatar Dec 09 '21 03:12 Ianresearch

Hi, logit_scale and logit_bias are class-wise magnitudes and margin(scalar for each class), respectively, as described in the original paper.

tonysy avatar Dec 09 '21 03:12 tonysy

Thanky you for reply. If I use the method in semantic segmentation, and the input image size is (512,512), the parameter shape should be the shape(1, num_classes, 512, 512) or not? I think the shape(1, num_classes, 1, 1) in your code is used for Image Classification.Is that correct?

Ianresearch avatar Dec 09 '21 03:12 Ianresearch

The code should work for this scenario as the broadcast will be conducted automatically. You can try this example

>>> import torch
>>> data = torch.randn(1,10, 512,512)
>>> a = torch.ones(1,10,1,1)
>>> b = torch.zeros(1,10,1,1)
>>> out = a * data + b
>>> out.shape
torch.Size([1, 10, 512, 512])
>>>

tonysy avatar Dec 09 '21 03:12 tonysy

Hi tonysy, I use this paper method in my u-net semantic segmentation, but the result is not improved. Is there something wrong in my implementation process? The u-net reuslt is 4 classes and image size is 512*512, and the last layer is: reuslt = conv_bn_relu(inputchannel, outputchannel=4).In the first stage, I train this u-net with lr=0.02. In the second stage, I fix the parameter in the all net except the last layer, and add the layer code: #add the long-tail disalign confidence = self.confidence_layer(result) #self.confidence_layer=conv_bn_relu(inchannel=4,outchannel=1) confidence = torch.sigmoid(confidence) #only adjust the foreground classification scores scores_tmp = confidence * (result * self.logit_scale + self.logit_bias) result = scores_tmp + (1 - confidence) * result At the same time in the second stage, I re-weight the celoss weight using the method in Sec.3.2.2(ρ=0.3) .Train the above net again with lr=0.02. Looking forward your answer.

Ianresearch avatar Dec 13 '21 02:12 Ianresearch

First, in DisAlign, all layers learned in stage-1 are fixed during the second stage. Second, the proposed method mainly focuses on long-tail, which typically consists of many tail classes.

Thus, you can fix all layers of stage-1, and remove the confidence layer, only using the GRW for you case.(confidence estimation has minor improvement on segmentation task in our recent experiment)

Such as:

result = result * self.logit_scale + self.logit_bias

Then only learn the logit_scale and logit_bias with the GRWCrossEntropyLoss https://github.com/Megvii-BaseDetection/DisAlign/blob/a2fc3500a108cb83e3942293a5675c97ab3a2c6e/semantic_seg/disalign/models/losses/grw_cross_entropy_loss.py#L42

tonysy avatar Dec 13 '21 03:12 tonysy

I will try again.The dataset is long-tail distribution, the biggest class ration is 74% and The smallest proportion is only 0.2%.(74%,11%,14.8%,0.2%). Thank you very much.

Ianresearch avatar Dec 13 '21 03:12 Ianresearch

Hi, the case described is imbalanced classification, not long-tail. long-tail means there exist many tail classes.(typically hundreds or thousands of classes in total)

tonysy avatar Dec 13 '21 03:12 tonysy

In this project, where is the code about imbalanced image classification, which script should be used? @tonysy

Fly-dream12 avatar Mar 03 '22 07:03 Fly-dream12