habitat-sim icon indicating copy to clipboard operation
habitat-sim copied to clipboard

Add support for importing semantics from https://3dscenegraph.stanford.edu for use with gibson dataset

Open msbaines opened this issue 5 years ago • 9 comments

🚀 Feature

We want to be able to use the semantic dataset from https://3dscenegraph.stanford.edu/ with the scene dataset from http://gibsonenv.stanford.edu/

msbaines avatar Dec 10 '19 21:12 msbaines

The 3dscenegraph semantic dataset is currently limited to gibson_tiny. However, semantics for gibson_medium are expected to be release soon.

msbaines avatar Dec 10 '19 21:12 msbaines

The mesh used for semantics is different from the mesh used for Habitat. The coordinate system is also different but both meshes. The Y and Z axis are switched but the origin is the same. We should be able to generate a semantic ply mesh from the origin .obj mesh by transforming vertex coordinates.

msbaines avatar Dec 10 '19 21:12 msbaines

The semantic data from 3DSceneGraph is available via an npz file.

To access the data:

data = np.load(npz_path, allow_pickle=True)['output'].item()

This will return the following dictionary:

dict_keys(['building', 'room', 'object', 'camera', 'panorama'])

building

>>> pprint.pprint(data['building'])
{'floor_area': 35.04333970662052,
 'function': 'residential',
 'gibson_split': 'tiny',
 'id': 2,
 'name': 'Allensville',
 'num_cameras': 26,
 'num_floors': 1,
 'num_objects': 33,
 'num_rooms': 11,
 'object_inst_segmentation': array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]]),
 'object_voxel_occupancy': array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]]),
 'reference_point': array([0., 0., 0.]),
 'room_inst_segmentation': array([[10.],
       [ 8.],
       [ 8.],
       ...,
       [11.],
       [11.],
       [11.]]),
 'room_voxel_occupancy': array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]]),
 'size': array([9.76260114, 8.85856447, 2.50682107]),
 'volume': 201.63530725199752,
 'voxel_centers': array([[-0.98990329, -0.97658977, -0.02014603],
       [-0.98990329, -0.97658977,  0.07985397],
       [-0.98990329, -0.97658977,  0.17985397],
       ...,
       [ 8.71009671,  7.82341023,  2.27985397],
       [ 8.71009671,  7.82341023,  2.37985397],
       [ 8.71009671,  7.82341023,  2.47985397]]),
 'voxel_resolution': array([98, 89, 26]),
 'voxel_size': 0.1}

room

>>> pprint.pprint(data['room'])
{1: {'floor_area': 8.73437848991798,
     'floor_number': 'A',
     'id': 1,
     'location': array([3.53988  , 0.2945975, 1.116783 ]),
     'parent_building': 2,
     'scene_category': 'bathroom',
     'size': array([2.5752  , 2.370445, 2.254074]),
     'volume': 10.48092837735436},
 2: {'floor_area': 9.826049281845457,
     'floor_number': 'A',
     'id': 2,
     'location': array([0.42217  , 2.404948 , 1.1276055]),
     'parent_building': 2,
     'scene_category': 'bathroom',
     'size': array([2.88698 , 2.927084, 2.259769]),
     'volume': 13.84569568093703},
 3: {'floor_area': 11.789331640706246,
     'floor_number': 'A',
     'id': 3,
     'location': array([6.99981 , 0.605225, 1.24227 ]),
     'parent_building': 2,
     'scene_category': 'bedroom',
     'size': array([3.39914, 3.06071, 2.48226]),
     'volume': 23.095719066402776},

 ...

 11: {'floor_area': 7.659161263755288,
      'floor_number': 'A',
      'id': 11,
      'location': array([ 0.2298935, -0.0203395,  1.2285965]),
      'parent_building': 2,
      'scene_category': 'lobby',
      'size': array([2.361493, 1.971701, 2.453287]),
      'volume': 9.14560537370886}}

object

>>> pprint.pprint(data['object'])
{1: {'action_affordance': ['open', 'close', 'cook', 'heat', 'defrost', 'clean'],
     'class_': 'microwave',
     'floor_area': 2.826599475275465,
     'id': 1,
     'location': array([2.83998585, 4.76085063, 1.49223023]),
     'material': ['glass', 'metal'],
     'parent_room': 9,
     'size': array([0.40677453, 1.2802279 , 0.45474387]),
     'surface_coverage': 0.6978848300032634,
     'tactile_texture': None,
     'visual_texture': None,
     'volume': 0.08689193617144757},
 2: {'action_affordance': ['open',
                           'close',
                           'heat',
                           'turn on',
                           'turn off',
                           'clean'],
     'class_': 'oven',
     'floor_area': 3.1440354889034574,
     'id': 2,
     'location': array([2.98861606, 4.78304369, 0.46367262]),
     'material': ['metal', 'glass'],
     'parent_room': 9,
     'size': array([0.7124521 , 1.00192841, 0.94029514]),
     'surface_coverage': 1.3838881855100549,
     'tactile_texture': None,
     'visual_texture': None,
     'volume': 0.32710032579657},
 3: {'action_affordance': ['wash', 'clean'],
     'class_': 'sink',
     'floor_area': 1.7597120848145011,
     'id': 3,
     'location': array([ 4.23522156, -0.57456161,  0.91402512]),
     'material': ['ceramic', None],
     'parent_room': 1,
     'size': array([0.57416017, 0.54074392, 0.17042408]),
     'surface_coverage': 0.2042751198409106,
     'tactile_texture': None,
     'visual_texture': None,
     'volume': 0.014058957460380607},

 ...

 33: {'action_affordance': ['sit at',
                            'lay on',
                            'pick up',
                            'move',
                            'clean',
                            'set',
                            'decorate'],
      'class_': 'dining table',
      'floor_area': 3.473003668596787,
      'id': 33,
      'location': array([4.48357247, 6.70686119, 0.5614044 ]),
      'material': ['wood', None],
      'parent_room': 8,
      'size': array([1.14685995, 0.68447991, 0.6124021 ]),
      'surface_coverage': 1.3836484369355602,
      'tactile_texture': None,
      'visual_texture': None,
      'volume': 0.27629888509653355}}

camera

>>> pprint.pprint(data['camera'])
 1: {'FOV': 1.0489180166567196,
        'id': 1,
        'location': array([6.19820356, 4.94441748, 1.27608538]),
        'modality': 'RGB',
        'name': 'point_0_view_0',
        'parent_room': 11,
        'resolution': array([1024, 1024]),
        'rotation': [1.616633415222168, -0.01483128871768713, 1.8443574905395508]},

 ...

 2863: {'FOV': 1.0376312872786024,
        'id': 2863,
        'location': array([3.41174531, 6.73860884, 1.23510659]),
        'modality': 'RGB',
        'name': 'point_9_view_4',
        'parent_room': 11,
        'resolution': array([1024, 1024]),
        'rotation': [1.9443336725234985,
                     0.0011426351265981793,
                     -1.8368979692459106]}}

panorama

>>> pprint.pprint(data['panorama'])
{'p000001': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16),
             'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16)},
 'p000002': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16),
             'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16)},

 ...

 'p000026': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16),
             'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int16)}}

msbaines avatar Dec 11 '19 19:12 msbaines

Was hoping to load the .npz file in C++ using cnpy, but the .npz contains pickled data which cnpy can't handle. I could potentially handle the pickled data using http://www.picklingtools.com/.

Alternatively, I could do all the processing in python but the python tools for writing a mesh are cumbersome and likely slow. So for now, I think I will write a python script to convert the data I need into a format that can be easily loaded in C++ and then do the processing in C++.

msbaines avatar Dec 11 '19 19:12 msbaines

The semantic mask information is located in:

data['building']['object_inst_segmentation']

which an array that is the same size as the number of faces. Each element in the array contains the semantic object_id for that face. If we don't have semantic information for that face, the id is 0.

The format of the array:

>>> type(data['building']['object_inst_segmentation'])
<class 'numpy.ndarray'>
>>> type(data['building']['object_inst_segmentation'][0])
<class 'numpy.ndarray'>
>>> data['building']['object_inst_segmentation'][0]
array([0.])
>>> data['building']['object_inst_segmentation'][0][0]
0.0
>>> type(data['building']['object_inst_segmentation'][0][0])
<class 'numpy.float64'>

msbaines avatar Dec 11 '19 21:12 msbaines

We should be able to write out the object ids using the following code:

f = open("out.bin", "wb")
object_ids = data['building']['object_inst_segmentation']
f.write(object_ids.astype(np.int16).tobytes())
f.close()

msbaines avatar Dec 11 '19 22:12 msbaines

Bounding box information:

Looking at the object schema:

 33: {'action_affordance': ['sit at',
                            'lay on',
                            'pick up',
                            'move',
                            'clean',
                            'set',
                            'decorate'],
      'class_': 'dining table',
      'floor_area': 3.473003668596787,
      'id': 33,
      'location': array([4.48357247, 6.70686119, 0.5614044 ]),
      'material': ['wood', None],
      'parent_room': 8,
      'size': array([1.14685995, 0.68447991, 0.6124021 ]),
      'surface_coverage': 1.3836484369355602,
      'tactile_texture': None,
      'visual_texture': None,
      'volume': 0.27629888509653355}}

location and size may be the axis-aligned bounding box. This will have to be verified.

msbaines avatar Dec 16 '19 18:12 msbaines

Semantics seem to be working with the following transformation:

x1 = x0 y1 = -z0 z1 = y1

msbaines avatar Dec 16 '19 18:12 msbaines

Hello, I want to know can we get the room's centers and bounding boxes from habitat in Gibson dataset? I used the 3Dscenegraph as gibson semantics but only get the SemanticObject class. Thanks!

ybgdgh avatar Sep 15 '20 01:09 ybgdgh