oneformer3d What does superpoint mean?

Hi, I'm confused by the superpoints argument used in predict_by_feat_instances() function in oneformer3d.py.

In line 74, the scores are flattened. Before this flattening, scores are of shape S x 18/198 for scannet and scannet200 dataset cause there are 18/198 instance classes, respectively, and S is the number of superpoints which are preprocessed and stored in line 87 in batch_load_scannet_data.py. After this flattening and topk operation, the same superpoint can have multiple predicted instance classes if I'm understood correctly, because the topk is performed after flattening, it is possible that one superpoint have the largest and second largest scores across all S x 18/198 numbers, in this case, the superpoint will have distinct labels line 76.

But, the line 78 reduces the topk_idx from range S*18/198 to S, meaning even if one superpoint can have multiple predicted classes, it can only have one unique predicted mask selected from out['masks'][0] with shape S x S.

In my understanding, one superpoint cover multiple instances of different classes, but why they share the same mask? Is there something wrong?

Feb 13 '25 11:02 seanzhuh

We have S different predicted instance mask. Also we have S * n_classes scores for them. Then we select topk best scores from these S * n_classes. Then we select topk masks corresponding to these selected scores. Since the number of masks is n_classes times less compared to number of scores, the same mask can correspond to several scores.

Feb 13 '25 12:02 filaPro

Yeah, but why the same mask can correspond to several scores and different labels? Is the mask correspond to a single instance, if this is the case, then it should only have one label and one score.

Feb 13 '25 12:02 seanzhuh

What drives me to this is that, I use another pretrained 3d recognition model to predict rather than using the out['cls_preds'][0] from a linear layer that is trained on scannet dataset.

To do this, I just replace the predicted instance labels and scores with my own logits, i.e., data_sample.pred_pts_seg.instance_labels=my_own_logits.max(dim=-1)[1] and data_sample.pred_pts_seg.instance_scores=my_own_logits.max(dim=-1)[0]. To get this my_own_logits, I use the predicted instance mask to crop the original point cloud to get the points of the predicted instances then I feed different instance points separately to a pretrained model to get the logits. my_own_logits have shape data_sample.pred_pts_seg.pts_instance_mask[0].shape[0] x 18/198 for scannet/scannet200 dataset, which is compatible to the learned logits. But I only got AP_all merely 2.4. However, when I use gt instance points, i.e., data_sample.gt_pts_seg.pts_instance_mask instead of the predicted pts_instance_mask, I got Acc@1 45, which suggests that the classification is actually not too bad. so I wonder if I'm using the predicted instance mask to crop instance points correctly?

Feb 13 '25 12:02 seanzhuh

but why the same mask can correspond to several scores and different labels

As I remember it just has a minor positive impact on mAP metric (may be around 1% or even less). You can use top1 prediction per each superpoint for any practical applications.

Feb 13 '25 12:02 filaPro

Thank you, could you share some hint on this ? I've debugged for many days.

What drives me to this is that, I use another pretrained 3d recognition model to predict rather than using the out['cls_preds'][0] from a linear layer that is trained on scannet dataset.

To do this, I just replace the predicted instance labels and scores with my own logits, i.e., data_sample.pred_pts_seg.instance_labels=my_own_logits.max(dim=-1)[1] and data_sample.pred_pts_seg.instance_scores=my_own_logits.max(dim=-1)[0]. To get this my_own_logits, I use the predicted instance mask to crop the original point cloud to get the points of the predicted instances then I feed different instance points separately to a pretrained model to get the logits. my_own_logits have shape data_sample.pred_pts_seg.pts_instance_mask[0].shape[0] x 18/198 for scannet/scannet200 dataset, which is compatible to the learned logits. But I only got AP_all merely 2.4. However, when I use gt instance points, i.e., data_sample.gt_pts_seg.pts_instance_mask instead of the predicted pts_instance_mask, I got Acc@1 45, which suggests that the classification is actually not too bad. so I wonder if I'm using the predicted instance mask to crop instance points correctly?

Feb 13 '25 12:02 seanzhuh

Is it because my_own_logits will give the same labels for the same mask but yours will give different labels for the same mask.

Feb 13 '25 12:02 seanzhuh