CLEVRER copied to clipboard
Mapping from 3D object coordinates to 2D pixel coordinates
I want to find the function that maps a 3D object coordinate as might be found in the annotation (e.g. in CLEVRER/train/annotation_train/annotation_00000-01000/annotation_00000.json['motion_trajectory'][0]['objects'][0]['location'], which looks like e.g. (coord_x, coord_y, coord_z) = (1.3234, 2.7147, 0.2) ) to a (pix_height,pix_width) in the 320x480 pixel output image.
I can approximate it roughly linearly with
pix_width= int(coord_y * 80 + 240)
pix_height= int(coord_x * 34 + 160)
but it seems the relation isn't exactly linear:
Hence, I was wondering if there is some ground-truth mapping I overlooked somewhere which could get me the exact mapping :)
As a follow up question:
The readme mentions that visual masks can be found here.
I was thinking I could use visual mask annotations as a way to get object-center-pixel-locations.
When I extract the tar gz at that link, I get a list of json files for each video, where each json looks like this:
I couldn't figure out where the mask information is stored here; I'm thinking it's maybe in the 'counts' field with the long random string value (yellow highlight), but I'm not sure how to decode that string.
Could you help with this? :)