oneformer3d About Semantic Segmentation

Thank you very much for your outstanding work! I am focusing on the semantic segmentation task. In the OneFormer3D code, I noticed that semantic segmentation training is performed by predicting the superpoint categories and applying a cross-entropy loss with respect to the ground-truth superpoint labels. In my experiments, I removed the queries and corresponding loss functions related to instance segmentation. It's worth noting that semantic segmentation relies on the pretrained weights from SSTNet. When I trained the model from scratch without loading the SSTNet pretrained weights (i.e., using randomly initialized weights), the model was still able to converge, but the mIoU was lower compared to using the pretrained weights. What do you think could be the reason for this? Is it because the spconvUnet has already been trained with the semantic loss 𝐿_semantic as described in its paper? In the SSTNet paper, 𝐿_semantic is computed using both the Dice loss and cross-entropy loss at the point level.

This is the performance of not loading SSTNET_Weights

This is the performance of loading SSTNET_Weights

Many thanks in advance for your kind support and valuable feedback.

Jul 18 '25 07:07 YunZhou0321

I use 8*RTX3090 GPUS for training.

Jul 18 '25 07:07 YunZhou0321

I don't have a good answer here. We also noticed that random initialization is not good enough for both instance and semantic segmentation. Not sure if there is anything particularly good about sstnet weights, or the backbone can be initialized with just some u-net semantic segmentation pre-training.

Jul 18 '25 09:07 filaPro

Thank you very much for your reply. I really appreciate your insights and the time you took to respond.

Jul 18 '25 10:07 YunZhou0321