Total3DUnderstanding
Total3DUnderstanding copied to clipboard
One Different Image Layout Estimation and Drawing 3D Layout
Hi, I really appreciate the project and hope it can be developed more :)
Now, I'm trying to do only layout_estimation. My purpose is to give an image and take its layout 3D image.
Like this :
First problem is that how can cam_K be estimated ? I have check out all code samples. I can could layout
and cam_R
estimation. In all your samples you use cam_K of data to draw 3D layout. How can I predict it or is there any way to draw 3D without cam_K.
Second problem is that I don't know I am doing correctly but when I tried to estimate layouts of demo datas, my results were really bad. I used demo.py steps to predict layout points. For weight, I used your pretained_model firstly, then I trained 100 epochs and tried its weight. But the results was same.
I used here @chengzhag's layout_estimation.yaml
def estimate(img_path):
cfg = CONFIG("configs/layout_estimation.yaml",)
checkpoint = CheckpointIO(cfg)
cfg = mount_external_config(cfg)
device = load_device(cfg)
cfg.config["mode"] = "demo"
net = load_model(cfg, device=device)
checkpoint.register_modules(net=net)
cfg.config['demo_path'] = img_path
data = load_demo_data(cfg.config['demo_path'], device)
with torch.no_grad():
est_data = net(data)
lo_bdb3D_out = get_layout_bdb_sunrgbd(cfg.bins_tensor, est_data['lo_ori_reg_result'],
torch.argmax(est_data['lo_ori_cls_result'], 1),
est_data['lo_centroid_result'],
est_data['lo_coeffs_result'])
layout = lo_bdb3D_out[0,:,:].cpu().numpy()
cam_R_out = get_rotation_matix_result(cfg.bins_tensor,
torch.argmax(est_data['pitch_cls_result'], 1), est_data['pitch_reg_result'],
torch.argmax(est_data['roll_cls_result'], 1), est_data['roll_reg_result'])
pre_cam_R = cam_R_out[0, :, :].cpu().numpy()
pre_layout = format_layout(layout)
return pre_layout, pre_cam_R
To draw 3D layout : (I'm getting cam_K of the sample. Not shown here)
img_path = "./demo/inputs/1"
sequence_id = img_path[-1]
rgb_image = np.asarray(Image.open(img_path+"/img.jpg").convert('RGB'))
pre_layout , pre_cam_R = estimate(img_path)
scene_box = Box(rgb_image, None, cam_K, None, pre_cam_R, None,
pre_layout, None, None, 'prediction', None)
scene_box.draw3D(if_save=True, save_path = './demo/sunrgbd/%s_recon.png' % (sequence_id))
I got results like this:
It should seem like this :
I hope that I could express myself clearly. Thank very much^^
Hi,
In the paper, we actually ask for camera intrinsics (i.e., cam_K), otherwise, this problem would be extremely ambiguous.
For the layout estimation in our demo, our prediction is here. The figure is exactly produced by our demo code.
Best, Yinyu