HRNet-Semantic-Segmentation icon indicating copy to clipboard operation
HRNet-Semantic-Segmentation copied to clipboard

I am confused about class channel dimention.

Open babbu3682 opened this issue 4 years ago • 5 comments

I am confused about the class channel dimension.

I look at the cityspace dataset mask, the shape is (H, W) and there is no class dimension. they mark the class with value (e.x. 12=human, 13 = bike ... )

The segmentation models I've been working on before I see the HRnet used the input (B, C, H, W.

But, this HRnet seems to use input (B, H, W) where one-hot encoding can't be seen. Or is one-hot encoding unnecessary?

However, I look at Conv2d in the model's last layer, and I look at the output, it seems that the class dimension exists. I have no idea.

image

babbu3682 avatar May 02 '20 02:05 babbu3682

The segmentation mask is a one-hot label map. But when using softmax-loss, you need generate the predictions with the num_classes channels.

sunke123 avatar May 18 '20 07:05 sunke123

softmax-loss ...?

babbu3682 avatar May 19 '20 11:05 babbu3682

I also have this doubt, I wonder if the output of the model contains num_classes channels, how can I make the output array be an image ?

sakura-iv avatar Jul 13 '20 08:07 sakura-iv

Each image channel corresponds to a category heat map?like this?

EricHuiK avatar Aug 26 '20 07:08 EricHuiK

In my understanding, for example, if you put an image with shape 1,3,512,1024 into the model, you will get an output matrix with shape 1,19,128,256 . This is because there are 19 calsses in total, each channel corresponds to a class, so we do upsample operation first and then we do an argmax operation to this matrix, we can get each pixel's final class by identifying the channel with highest probability. And by assigning a color to each class, we will get the segmentation image.

sakura-iv avatar Aug 26 '20 08:08 sakura-iv