habitat-sim
habitat-sim copied to clipboard
Add support for importing semantics from https://3dscenegraph.stanford.edu for use with gibson dataset
🚀 Feature
We want to be able to use the semantic dataset from https://3dscenegraph.stanford.edu/ with the scene dataset from http://gibsonenv.stanford.edu/
The 3dscenegraph semantic dataset is currently limited to gibson_tiny. However, semantics for gibson_medium are expected to be release soon.
The mesh used for semantics is different from the mesh used for Habitat. The coordinate system is also different but both meshes. The Y and Z axis are switched but the origin is the same. We should be able to generate a semantic ply mesh from the origin .obj mesh by transforming vertex coordinates.
The semantic data from 3DSceneGraph is available via an npz file.
To access the data:
data = np.load(npz_path, allow_pickle=True)['output'].item()
This will return the following dictionary:
dict_keys(['building', 'room', 'object', 'camera', 'panorama'])
building
>>> pprint.pprint(data['building'])
{'floor_area': 35.04333970662052,
'function': 'residential',
'gibson_split': 'tiny',
'id': 2,
'name': 'Allensville',
'num_cameras': 26,
'num_floors': 1,
'num_objects': 33,
'num_rooms': 11,
'object_inst_segmentation': array([[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]]),
'object_voxel_occupancy': array([[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]]),
'reference_point': array([0., 0., 0.]),
'room_inst_segmentation': array([[10.],
[ 8.],
[ 8.],
...,
[11.],
[11.],
[11.]]),
'room_voxel_occupancy': array([[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]]),
'size': array([9.76260114, 8.85856447, 2.50682107]),
'volume': 201.63530725199752,
'voxel_centers': array([[-0.98990329, -0.97658977, -0.02014603],
[-0.98990329, -0.97658977, 0.07985397],
[-0.98990329, -0.97658977, 0.17985397],
...,
[ 8.71009671, 7.82341023, 2.27985397],
[ 8.71009671, 7.82341023, 2.37985397],
[ 8.71009671, 7.82341023, 2.47985397]]),
'voxel_resolution': array([98, 89, 26]),
'voxel_size': 0.1}
room
>>> pprint.pprint(data['room'])
{1: {'floor_area': 8.73437848991798,
'floor_number': 'A',
'id': 1,
'location': array([3.53988 , 0.2945975, 1.116783 ]),
'parent_building': 2,
'scene_category': 'bathroom',
'size': array([2.5752 , 2.370445, 2.254074]),
'volume': 10.48092837735436},
2: {'floor_area': 9.826049281845457,
'floor_number': 'A',
'id': 2,
'location': array([0.42217 , 2.404948 , 1.1276055]),
'parent_building': 2,
'scene_category': 'bathroom',
'size': array([2.88698 , 2.927084, 2.259769]),
'volume': 13.84569568093703},
3: {'floor_area': 11.789331640706246,
'floor_number': 'A',
'id': 3,
'location': array([6.99981 , 0.605225, 1.24227 ]),
'parent_building': 2,
'scene_category': 'bedroom',
'size': array([3.39914, 3.06071, 2.48226]),
'volume': 23.095719066402776},
...
11: {'floor_area': 7.659161263755288,
'floor_number': 'A',
'id': 11,
'location': array([ 0.2298935, -0.0203395, 1.2285965]),
'parent_building': 2,
'scene_category': 'lobby',
'size': array([2.361493, 1.971701, 2.453287]),
'volume': 9.14560537370886}}
object
>>> pprint.pprint(data['object'])
{1: {'action_affordance': ['open', 'close', 'cook', 'heat', 'defrost', 'clean'],
'class_': 'microwave',
'floor_area': 2.826599475275465,
'id': 1,
'location': array([2.83998585, 4.76085063, 1.49223023]),
'material': ['glass', 'metal'],
'parent_room': 9,
'size': array([0.40677453, 1.2802279 , 0.45474387]),
'surface_coverage': 0.6978848300032634,
'tactile_texture': None,
'visual_texture': None,
'volume': 0.08689193617144757},
2: {'action_affordance': ['open',
'close',
'heat',
'turn on',
'turn off',
'clean'],
'class_': 'oven',
'floor_area': 3.1440354889034574,
'id': 2,
'location': array([2.98861606, 4.78304369, 0.46367262]),
'material': ['metal', 'glass'],
'parent_room': 9,
'size': array([0.7124521 , 1.00192841, 0.94029514]),
'surface_coverage': 1.3838881855100549,
'tactile_texture': None,
'visual_texture': None,
'volume': 0.32710032579657},
3: {'action_affordance': ['wash', 'clean'],
'class_': 'sink',
'floor_area': 1.7597120848145011,
'id': 3,
'location': array([ 4.23522156, -0.57456161, 0.91402512]),
'material': ['ceramic', None],
'parent_room': 1,
'size': array([0.57416017, 0.54074392, 0.17042408]),
'surface_coverage': 0.2042751198409106,
'tactile_texture': None,
'visual_texture': None,
'volume': 0.014058957460380607},
...
33: {'action_affordance': ['sit at',
'lay on',
'pick up',
'move',
'clean',
'set',
'decorate'],
'class_': 'dining table',
'floor_area': 3.473003668596787,
'id': 33,
'location': array([4.48357247, 6.70686119, 0.5614044 ]),
'material': ['wood', None],
'parent_room': 8,
'size': array([1.14685995, 0.68447991, 0.6124021 ]),
'surface_coverage': 1.3836484369355602,
'tactile_texture': None,
'visual_texture': None,
'volume': 0.27629888509653355}}
camera
>>> pprint.pprint(data['camera'])
1: {'FOV': 1.0489180166567196,
'id': 1,
'location': array([6.19820356, 4.94441748, 1.27608538]),
'modality': 'RGB',
'name': 'point_0_view_0',
'parent_room': 11,
'resolution': array([1024, 1024]),
'rotation': [1.616633415222168, -0.01483128871768713, 1.8443574905395508]},
...
2863: {'FOV': 1.0376312872786024,
'id': 2863,
'location': array([3.41174531, 6.73860884, 1.23510659]),
'modality': 'RGB',
'name': 'point_9_view_4',
'parent_room': 11,
'resolution': array([1024, 1024]),
'rotation': [1.9443336725234985,
0.0011426351265981793,
-1.8368979692459106]}}
panorama
>>> pprint.pprint(data['panorama'])
{'p000001': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int16),
'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int16)},
'p000002': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int16),
'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int16)},
...
'p000026': {'object_class': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int16),
'object_instance': array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int16)}}
Was hoping to load the .npz file in C++ using cnpy, but the .npz contains pickled data which cnpy can't handle. I could potentially handle the pickled data using http://www.picklingtools.com/.
Alternatively, I could do all the processing in python but the python tools for writing a mesh are cumbersome and likely slow. So for now, I think I will write a python script to convert the data I need into a format that can be easily loaded in C++ and then do the processing in C++.
The semantic mask information is located in:
data['building']['object_inst_segmentation']
which an array that is the same size as the number of faces. Each element in the array contains the semantic object_id for that face. If we don't have semantic information for that face, the id is 0.
The format of the array:
>>> type(data['building']['object_inst_segmentation'])
<class 'numpy.ndarray'>
>>> type(data['building']['object_inst_segmentation'][0])
<class 'numpy.ndarray'>
>>> data['building']['object_inst_segmentation'][0]
array([0.])
>>> data['building']['object_inst_segmentation'][0][0]
0.0
>>> type(data['building']['object_inst_segmentation'][0][0])
<class 'numpy.float64'>
We should be able to write out the object ids using the following code:
f = open("out.bin", "wb")
object_ids = data['building']['object_inst_segmentation']
f.write(object_ids.astype(np.int16).tobytes())
f.close()
Bounding box information:
Looking at the object schema:
33: {'action_affordance': ['sit at',
'lay on',
'pick up',
'move',
'clean',
'set',
'decorate'],
'class_': 'dining table',
'floor_area': 3.473003668596787,
'id': 33,
'location': array([4.48357247, 6.70686119, 0.5614044 ]),
'material': ['wood', None],
'parent_room': 8,
'size': array([1.14685995, 0.68447991, 0.6124021 ]),
'surface_coverage': 1.3836484369355602,
'tactile_texture': None,
'visual_texture': None,
'volume': 0.27629888509653355}}
location and size may be the axis-aligned bounding box. This will have to be verified.
Semantics seem to be working with the following transformation:
x1 = x0 y1 = -z0 z1 = y1
Hello, I want to know can we get the room's centers and bounding boxes from habitat in Gibson dataset? I used the 3Dscenegraph as gibson semantics but only get the SemanticObject class. Thanks!