HRNet-Semantic-Segmentation
HRNet-Semantic-Segmentation copied to clipboard
I am confused about class channel dimention.
I am confused about the class channel dimension.
I look at the cityspace dataset mask, the shape is (H, W) and there is no class dimension. they mark the class with value (e.x. 12=human, 13 = bike ... )
The segmentation models I've been working on before I see the HRnet used the input (B, C, H, W.
But, this HRnet seems to use input (B, H, W) where one-hot encoding can't be seen. Or is one-hot encoding unnecessary?
However, I look at Conv2d in the model's last layer, and I look at the output, it seems that the class dimension exists. I have no idea.
The segmentation mask is a one-hot label map. But when using softmax-loss, you need generate the predictions with the num_classes channels.
softmax-loss ...?
I also have this doubt, I wonder if the output of the model contains num_classes channels, how can I make the output array be an image ?
Each image channel corresponds to a category heat map?like this?
In my understanding, for example, if you put an image with shape 1,3,512,1024 into the model, you will get an output matrix with shape 1,19,128,256 . This is because there are 19 calsses in total, each channel corresponds to a class, so we do upsample operation first and then we do an argmax operation to this matrix, we can get each pixel's final class by identifying the channel with highest probability. And by assigning a color to each class, we will get the segmentation image.