monosdf how to use intrinsics and extrinsics from colmap ?

Hi ,

I have a custom image dataset, and run colmap to get the instrinsics and extrinsics of all the images. Then , how should I proceed to run monosdf and get the mesh?

Thanks!

Jiakui

Oct 12 '22 07:10 Jiakui

Hi, I think you could adapt this script https://github.com/autonomousvision/monosdf/blob/main/preprocess/scannet_to_monosdf.py to prepare the data

Oct 12 '22 08:10 niujinshuchong

Hi, author. I run colmap and the following code to generate 'cameras.npz'.

    intrin_mat = np.array([
        [fx,  0, cx],
        [ 0, fy, cy],
        [ 0,  0,  1]
    ])
    intrin_mat[0, 2] -= (w - min(w, h)) / 2.
    intrin_mat[1, 2] -= (h - min(w, h)) / 2.
    intrin_mat[:2, :] *= 384. / min(w, h)
    
    K = np.eye(4, dtype=np.float32)
    K[:3, :3] = intrin_mat

    valid_poses = np.isfinite(poses).all(axis=2).all(axis=1)
    min_vertices = poses[:, :3, 3][valid_poses].min(axis=0)
    max_vertices = poses[:, :3, 3][valid_poses].max(axis=0)
    scale_mat = np.eye(4, dtype=np.float32)
    scale_mat[:3, 3] = - (min_vertices + max_vertices) / 2.
    scale_mat[:3] *= 2. / (np.max(max_vertices - min_vertices) + 3.) 
    scale_mat = np.linalg.inv(scale_mat)
    
    cams = {}
    idx = 0
    for _, (pose, valid) in enumerate(zip(poses, valid_poses)):
        if not valid: continue
        pose = K @ np.linalg.inv(pose)
        cams[f"scale_mat_{idx:d}"] = scale_mat
        cams[f"world_mat_{idx:d}"] = pose
        idx += 1
    np.savez(osp.join(path, "cameras.npz"), **cams)

It seems that the losses are converging, but the mesh is just strange. 221012_00 The normal train reslut is as follows. 221012_01 I notice the code in 'scene_dataset.py' and comment it, the result keeping bad.

elif center_crop_type == 'center_crop_for_dtu':
     scale = 384 / 1200
     offset = (1600 - 1200) * 0.5
     intrinsics[0, 2] -= offset
     intrinsics[:2, :] *= scale

I guess the issue lies in the transform process to camera poses. Do u have any idea?

Oct 12 '22 11:10 CCOSerika

@CCOSerika Sorry for the late reply. This line

scale_mat[:3] *= 2. / (np.max(max_vertices - min_vertices) + 3.)

is used for ScanNet (indoor scenes) where the cameras are inside the rooms. In the DTU cases it should be changed to something like

scale_mat[:3] *= 3. / (np.max(max_vertices - min_vertices))

Oct 15 '22 07:10 niujinshuchong

Sorry，I want to ask if pose conversion is only used to generate mesh in the evaluation phase？

Oct 30 '22 03:10 Thermaloo

@MingRuiye Sorry I missed your messages. Not sure whether I understand your question. The pose conversion is used in evaluation phase because we need to convert the extracted mesh back to original world space.

Dec 08 '22 12:12 niujinshuchong