Pose Estimation without tracking, poor perfomance
When doing pose estimation only and no tracking, what is the suggested format for RGB, Depth, and Mask Images? Currently, I have scraped some of your code together like this, but I am getting poor results.
Also, to clarify what does the pose correspond to : the transformation of object wrt to camera or the other way around
def get_color(image_input):
color = imageio.imread(image_input)[...,:3]
color = cv2.resize(color, (480, 640), interpolation=cv2.INTER_NEAREST)
return color
def get_depth(depth_input):
depth = cv2.imread(depth_input,-1)/1e3
depth = cv2.resize(depth, (480, 640), interpolation=cv2.INTER_NEAREST)
depth[(depth<0.1) | (depth>=np.inf)] = 0
return depth
def get_mask(mask_input):
mask = cv2.imread(mask_input,-1)
if len(mask.shape)==3:
for c in range(3):
if mask[...,c].sum()>0:
mask = mask[...,c]
break
mask = cv2.resize(mask, (480, 640), interpolation=cv2.INTER_NEAREST).astype(bool).astype(np.uint8)
return mask
peg_rgb_image = get_color(rgb_image_path)
depth_image = get_depth(depth_image_path)
mask_image = get_mask(mask_image_path).astype(bool)
cv2.imshow('1', peg_rgb_image)
cv2.waitKey(0) # Wait indefinitely until a key is pressed
cv2.destroyAllWindows()
pose = est.register(K=rros_camera_I_matrix, rgb=peg_rgb_image, depth=depth_image, ob_mask=mask_image, iteration=est_refine_iter)
center_pose = pose @ np.linalg.inv(to_origin)
vis = draw_posed_3d_box(rros_camera_I_matrix, img=peg_rgb_image, ob_in_cam=center_pose, bbox=bbox)
vis = draw_xyz_axis(peg_rgb_image, ob_in_cam=center_pose, scale=0.1, K=rros_camera_I_matrix, thickness=3, transparency=0, is_input_rgb=True)
# cv2.imshow('1', vis[...,::-1])
cv2.imshow('1', vis)
cv2.waitKey(0) # Wait indefinitely until a key is pressed
cv2.destroyAllWindows()
@wenbowen123
Hi, there are typically issues about setting up your data, in particular the depth format. I'd suggest to search in the issues as there are couple related.
The estimated poses are object to the camera.
Hi @dhanushTDS and @wenbowen123, in order to perform pose estimation of multiple images, do we need to provide mask for each input rgb image?
If we do what @ wenbowen123 mentioned, where we are not tracking; instead, we treat each image as a sample from an unordered sequence, then we have to supply the mask to the estimator for each image.
In their example code, they are tracking on frames of a continuous video and only need the mask for the first image.
Thanks @dhanushTDS
We just updated the readme, where there is a trouble shooting section. Please check it out.