pixel-level-contrastive-learning icon indicating copy to clipboard operation
pixel-level-contrastive-learning copied to clipboard

Extending the model for Semantic Segmentation task

Open NamburiSrinath opened this issue 2 years ago • 3 comments

Hi,

First of all, thanks for the repo. I am new to this area but was able to use the repo quite easily.

My downstream task is semantic segmentation and I am wondering if I can just change last layers to do the task.

In particular, this is what I did

  1. I have a large unlabeled dataset which I am successfully able to pass through the model (it accepts only 3 channeled images, mine are grayscale images, so I just stacked the same image for all channels. I hope that's ok!). My understanding is that, if this is done, the pretext task is completed :)
  2. Now, for downstream task, I am wondering how to change the last layer in 'improved-resnet.pth' and add new layers so that, I can pass new data (small quantity, semantic labeled) and finetune the model for segmentation. In particular, I don't know what layers to add for segmentation task (removing the last Linear layer can be done using children())

Any help would be greatly appreciated.

Thanks a lot

NamburiSrinath avatar Nov 10 '21 19:11 NamburiSrinath

You can use it as a backbone for a segmentation model just like a regular resnet for example : Mask RCNN with resnet as backbone

Spinkoo avatar Nov 12 '21 14:11 Spinkoo

Thanks Spinkoo for your comment.

I tried to add FCN head by making this model as backbone.

  1. First I am freezing the layers (so for downstream task these weights will not be trained)
improved_resnet = torch.load('improved-resnet.pt')
for param in improved_resnet.parameters():
    param.requires_grad = False

  1. Then I took the fcn-classifier head, added a transpose2d conv and attached it to the end of the given resnet model
fcn_model = torch.hub.load('pytorch/vision:v0.10.0', 'fcn_resnet50', pretrained=False)
fcn_model.classifier[4] = torch.nn.ConvTranspose2d(512, 2, stride = (8,8), kernel_size = (8,8))
improved_resnet.avgpool = fcn_model.classifier
improved_resnet.fc = Identity()

Note: Because somewhere pool/fc, the dimensions are mismatching, I had to upconvolute.

This is the summary of the model when a (16, 3, 512, 512) is passed (16 batch size, 3 channels, only last layer output is placed for brevity)

│ │ └─ReLU: 3-148 [16, 2048, 16, 16] -- ├─FCNHead: 1-9 [16, 2, 128, 128] -- │ └─Conv2d: 2-17 [16, 512, 16, 16] 9,437,184 │ └─BatchNorm2d: 2-18 [16, 512, 16, 16] 1,024 │ └─ReLU: 2-19 [16, 512, 16, 16] -- │ └─Dropout: 2-20 [16, 512, 16, 16] -- │ └─ConvTranspose2d: 2-21 [16, 2, 128, 128] 65,538 ├─Identity: 1-10 [16, 32768] --

I have the following doubts:

  1. Is my approach to add a segmentation head on top of this model correct?
  2. I don't understand why it's flattening even though I made the flatten as Identity()

NamburiSrinath avatar Nov 16 '21 16:11 NamburiSrinath

if my segmentor is an encoder-decoder, should i use the encoder as the encoder-online ? the decoder as the MLP ?

luowei0701 avatar Sep 08 '22 12:09 luowei0701