about deeply supervision function
Hello,I have a question about computing loss between [out7,out8,out9] and GT mask. In the original code,out7 has an output shape:(None, 56, 56, 4) , out8: (None, 112, 112, 4), out9:(None, 224, 224, 4). To my knowledge,when compute the loss between segmentaion map and mask, shouldn't the output and mask have the same dimension(all of them have same shape:(None, 224, 224, 4))?
Hi limyunn,
the predicted masks and the ground truth masks do have the same shape. The model will predict masks with the dimensions as you describe and the generator will generate ground truth masks with the same dimensions. Try this to convince yourself
print(train_generator.__next__()[1][0].shape,
train_generator.__next__()[1][1].shape,
train_generator.__next__()[1][2].shape)
Thanks,I’m just wondering if directly upscale the out7 and out8 feature map to original resolution using up- sampling layer will have any influence on performance of the model?
From our experiments using different scales works better. You can of course try an upsample layer or a transpose convolution. For example check monodepth2 and especially the "Multi-scale Estimation". It depends on your project. Also, depending on your image resolution you might want to consider using more or less outputs.
Hello Thanos-DB Regarding the output of out7, out8, and out9 in FCTNet, which will generate three sets of feature maps, how are they ultimately fused together? Is it upsampling to match the shape of Out9? If I were a binary task, would I want to merge the three sets of feature maps? Thank you!
你好 Thanos-DB 关于 FCTNet 中 out7、out8 和 out9 的输出,它们将生成三组特征图,它们最终是如何融合在一起的?是否是上采样以匹配 Out9 的形状?如果我是一个二进制任务,我是否要合并三组特征图?谢谢!
I also have the same problem,Regarding the outputs of out 7, out 8, and out 9 in FCTNet, they will generate three sets of feature maps. How do they ultimately merge together?