HRNet-Semantic-Segmentation icon indicating copy to clipboard operation
HRNet-Semantic-Segmentation copied to clipboard

How is multi-scale fusion performed?

Open kadattack opened this issue 3 years ago • 2 comments

How exactly is is multi-scale fusion performed when combining all the outputs from different branches in to one? I am asking about the process that happens AFTER the strided convolution and the upscaling is performed to get all of them the same size. Does it do a simple element-wise sum of all the outputs in to one? Or does it concatenate the outputs in to different channels? Capture

kadattack avatar Jul 02 '21 16:07 kadattack

You can see this in the forward pass of the HighResolutionNet module. After interpolation upsizing the resulting arrays are concatenated, and then passed through the last_layer submodule that consists of:

self.last_layer = nn.Sequential( nn.Conv2d( in_channels=last_inp_channels, out_channels=last_inp_channels, kernel_size=1, stride=1, padding=0), BatchNorm2d(last_inp_channels, momentum=BN_MOMENTUM), nn.ReLU(inplace=relu_inplace), nn.Conv2d( in_channels=last_inp_channels, out_channels=config["arch"]["num_classes"], kernel_size=extra["FINAL_CONV_KERNEL"], stride=1, padding=1 if extra["FINAL_CONV_KERNEL"] == 3 else 0) )

There's a final interpolation to enforce that the output size = input size.

StuvX avatar Jul 04 '21 06:07 StuvX

You can see this in the forward pass of the HighResolutionNet module. After interpolation upsizing the resulting arrays are concatenated, and then passed through the last_layer submodule that consists of:

self.last_layer = nn.Sequential( nn.Conv2d( in_channels=last_inp_channels, out_channels=last_inp_channels, kernel_size=1, stride=1, padding=0), BatchNorm2d(last_inp_channels, momentum=BN_MOMENTUM), nn.ReLU(inplace=relu_inplace), nn.Conv2d( in_channels=last_inp_channels, out_channels=config["arch"]["num_classes"], kernel_size=extra["FINAL_CONV_KERNEL"], stride=1, padding=1 if extra["FINAL_CONV_KERNEL"] == 3 else 0) )

There's a final interpolation to enforce that the output size = input size.

I'm very new to AI and pytorch but isn't this the code for final output of the whole Hrnet? I don't know if we are thinking about the same thing. Just to reconfirm, I'm talking about the merge process that happens throughout the whole net. image

From my understanding this is made in the function https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/f9fb1ba66ff8aea29d833b885f08df64e62c2b23/lib/models/hrnet.py#L207 , however i'm still not good enough to understand what happens at the end of the forward() function https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/f9fb1ba66ff8aea29d833b885f08df64e62c2b23/lib/models/hrnet.py#L277 It looks like it's adding the layers up with addition? Am I looking at the wrong part of the code?

kadattack avatar Jul 05 '21 13:07 kadattack