graspnetAPI 场景下的物体掩码

你们好！首先感谢你们的工作和细致耐心的解答。我想知道数据集中scenes/scene_${id}/${camera}/Label/${id}.png的掩码图片是如何获取的？比如scenes/scene_0100/kinect/Label/路径下的图片，预览如下。看起来像是场景中物体的掩码？

Apr 10 '25 03:04 Hriver-J

@Hriver-J @Fang-Haoshu @cww97 @cubercsl @chenxi-wang Yes, they are grayscale segmentation image, but I’m wondering if there is a mapping list between the segmentation mask and the object labels, because it seems that the grayscale pixel values in the mask do not directly correspond to the object IDs.

Apr 10 '25 19:04 minghu0830

@minghu0830 请问你知道有哪些比较通用、便捷的方法获取这种掩码吗？作者团队有50000+张类似的图片，并且看起来与相机拍摄的RGB图像是一一对应的，好像是有一些可以从RGB提取出掩码的工具。

object labels 是指类似于002_labels.npz的物体抓取标签吗？我是这样想的：在 https://github.com/graspnet/graspnetAPI/issues/42#issuecomment-1578094386 中chenxi-wang提到的使用物体点云模型和6D pose重建了场景，这个描述中物体信息和场景信息被结合使用了，像是把物体点云坐标、抓取结果等变换到了场景中，这一步一定是会使用到物体id的；如果还使用到了RGB图像，那id和seg mask就关联起来了。

不过在物体点云确定的情况下，图像中的物体边界应该是可以直接计算出的，不依赖seg mask，所以我倾向于grasp label和seg mask映射关系并不重要。考虑到graspnet的通用抓取效果，seg mask的一个作用可能是让模型学习到基于图片的“物体边界”的概念，让它遇到没见过的物体时能够识别、检测边界。

Apr 11 '25 01:04 Hriver-J