DPVO Real-time pose estimation

Hi, I'm trying to use the DPVO to get the pose for each frame in a real-time fashion, for example, printing xyz and quaternion. Do you have any hints for this? Looking at the function https://github.com/princeton-vl/DPVO/blob/4f2f0cc7efbfe2547e788844412a3a2a72a923bd/dpvo/dpvo.py#L153 I'm trying to print lietorch.SE3(slam.poses_[-1]).data.cpu().numpy(), but I always get value [0. 0. 0. 0. 0. 0. 1.].

Here is the modified loop inside the run function of demo.py:

while 1:
        (t, image, intrinsics) = queue.get()
        if t < 0: break

        image = torch.from_numpy(image).permute(2,0,1).cuda()
        intrinsics = torch.from_numpy(intrinsics).cuda()

        if slam is None:
            slam = DPVO(cfg, network, ht=image.shape[1], wd=image.shape[2], viz=viz)

        image = image.cuda()
        intrinsics = intrinsics.cuda()

        with Timer("SLAM", enabled=timeit):
            slam(t, image, intrinsics)
        
        print(lietorch.SE3(slam.poses_[-1]).data.cpu().numpy())

What am I doing wrong? Thanks in advance

Mar 30 '23 07:03 Lanzo98

You should be doing print(lietorch.SE3(slam.poses_[n-1]).data.cpu().numpy())

EDIT: (See next comment)

Apr 03 '23 03:04 lahavlipson

Thanks, it works! I'm also struggling to understand the format of the pose stored in the variable poses_. SE3 should be as [x, y, z, qw, qx, qy, qz] right? I saved the values returned from lietorch.SE3(slam.poses_[n-1]).data.cpu().numpy() and the poses returned from the terminate function. They are pretty different (the test is done on the iPhone IMG_0492.MOV video), and I discovered this is due to the inverse function here https://github.com/princeton-vl/DPVO/blob/4f2f0cc7efbfe2547e788844412a3a2a72a923bd/dpvo/dpvo.py#L168 I cannot use this since I'm working on single pose values. I also find that some conversions are done for the viewer with CUDA. Is there a different way of displaying the correct values [x, y, z, qw, qx, qy, qz]? Do you have any hints?

Apr 03 '23 08:04 Lanzo98

Internally, DPVO stores poses as a mapping from world coordinates to camera coordinates. The actual camera poses are the inverse of this, i.e. a mapping from camera coordinates to world coordinates. So to return the correct camera poses per-frame, you should actually do

lietorch.SE3(slam.poses_[n-1]).inv().data.cpu().numpy()

FYI different datasets and libraries represent rotation quaternions differently, usually either [qx, qy, qz, qw] (e.g. lietorch) or [qw, qx, qy, qz] (e.g. pytorch3d)

Apr 03 '23 20:04 lahavlipson

Thank you very much, all clear now.

Apr 04 '23 07:04 Lanzo98

Only a side comment on this, @lahavlipson you are saying that the last pose is given by lietorch.SE3(slam.poses_[n-1]).inv().data.cpu().numpy() but n is the keyframe number, so you are not effectively giving the last pose of the last frame, but rather of the last keyframe that might be far in time, depending on your configs.

The real pose corresponding to time t (I guess that is what @Lanzo98 is asking) is only obtainable after running slam.terminate(), that interpolates the missing poses in-between keyframe and therefore the "real" pose at time t is got. Correct me if I'm wrong.

Is it really feasible in DPVO to get the estimated pose for each new frame ?

Jun 30 '23 16:06 senecobis

@senecobis DPVO treats every new frame as a keyframe, and only removes keyframes when they are 3 timesteps old or fall out of the optimization window. So keyframes t-1, t-2 and t-3 are indeed the most recent 3 frames

Jun 30 '23 16:06 lahavlipson

@lahavlipson but then what is the whole purpose of this code part inside dpvo evaluation script.

I understood that, if the estimated flow between i and j is less then self.cfg.KEYFRAME_THRESH that is 15.0 then we remove the last frame from keyframes. Where i is the 5th to last keyframe (since self.cfg.KEYFRAME_INDEX = 4) and j is the 3rd to last keyframe.

Or you were referring to training? in training is for sure true that every frame is keyframe.

Jul 04 '23 15:07 senecobis

@senecobis Your understanding of the code part is almost correct; we don't remove the last frame if the flow between (n-3) and (n-5) is small, we remove the in-between frame (n-4)

Jul 04 '23 19:07 lahavlipson

Hi @lahavlipson, I'm publishing the result of lietorch.SE3(slam.poses_[n-1]).inv().data.cpu().numpy() in a PoseStamped ros message. I can view the position correctly but the rotation quaternions seems wrong (please see image). Do you have any idea of why? I'm taking into account that the format is [qx, qy, qz, qw] of the pose vector. Screenshot from 2024-02-07 11-51-49

Feb 07 '24 09:02 spokV

DPVO DPVO copied to clipboard

Real-time pose estimation

DPVO
DPVO copied to clipboard