monodepth2 Flip augmentation leads to bad PoseNet

Hi, I trained the Monodepth2 on KITTI Odometry Dataset twice, the first one with flip augmentation enabled and the second one with flip augmentation disabled, and got very different trajectories on test sequences Seq.09 and Seq.10. The model with flip augmentation disabled gives a much better global trajectory than the model with flip augmentation enabled.

with flip augmentation: sequence_09.pdf without flip augmentation: sequence_09.pdf

I am wondering why this happens. I think the PoseNet should also perform better with flip augmentation enabled, which is the case for DepthNet, since more data is leveraged. Has anyone met this before or does anyone have ideas about this?

All the best.

May 20 '21 06:05 Beniko95J

This is really interesting, thanks for sharing.

Of course it's possible there's a bug in our code around flip augmentation.

But assuming there isn't: perhaps this could be to do with the strong priors around car motion – the KITTI sequences are filmed in Germany, where cars drive on the right. By training with flip augmentation on, then the pose network has to learn both 'drive on left' world and 'drive on right' versions of the world – whereas by turning it off then the pose network only ever has to learn the 'correct' drive-on-right scenario. (Similar to ideas expressed in the visual chirality paper – https://linzhiqiu.github.io/papers/chirality/)

Just a thought.

I don't suppose you also tried training a depth model without flip aug? Or got pose benchmark numbers from your pose model? Both of those results might be insightful!

Thanks,

Michael

May 20 '21 14:05 mdfirman

Thanks for the reply! @mrharicot

I think your idea and the idea expressed in Visual Chirality do give some insights on this problem.

I don't suppose you also tried training a depth model without flip aug?

Yeah, I trained DepthNet along with PoseNet so I also have DepthNet trained with flip augmentation enabled and flip augmentation disabled. There is a slight drop of performance of depth model without flip aug, which is contrast to PoseNet.

In addition, I also trained DepthNet and PoseNet using the flipped version of training sequences (training data contains flipped images only) to see what will happen. As a result, the performance of DepthNet drops from abs=0.121 to abs=0.138, while PoseNet gives totally wrong global trajectories on unflipped Seq.09 and Seq.10.

It seems like the affect of visual chirality also depends on the task, like it affects less on the task of depth estimation and affects more on the task of pose estimation.

Or got pose benchmark numbers from your pose model?

I evaluate PoseNet by getting the estimated global trajectory and using KITTI Odometry Benchmark which calculates translational and rotational errors for all possible subsequences of length. The PoseNet without flip aug gives like t_rel=5.7, r_rel=1.9, and PoseNet with flip aug gives t_rel=11.1, r_rel=4.8. As for the PoseNet trained by only flipped images, the numbers are even much more worse than PoseNet with flip aug.

Hoping we can have more discussions on this!

May 21 '21 03:05 Beniko95J

Hi @Beniko95J , thanks for reporting back and for your updated numbers.

This is very interesting.

I'm glad to see that depth estimation benefits from flip augmentation!

The results of your pose experiments really do suggest that flipped images hurt pose estimates. I feel I should watch the KITTI sequences again (in flipped and unflipped versions!) to see if there's anything obvious which would be hurt by learning on flipped ims.

I think there's still a small chance that we have a bug e.g. around use of intrinsics in flipped images. If so, this type of bug might affect pose estimation more than depth, hence the effect you're seeing here. (But I much prefer the visual chirality explanation!)

May 21 '21 07:05 mdfirman

@mdfirman Even though its mentioned not to horizontally flip the image if the principal point is far from the center. But is it advisable to update the principal point if the principal point is far from the center and image needed to be horizontally flip. cx and cy will be updated to image_width - cx and cy.

Jun 20 '22 10:06 alwynmathew

monodepth2 monodepth2 copied to clipboard

Flip augmentation leads to bad PoseNet

monodepth2
monodepth2 copied to clipboard