mickey Customized dataset and potential application to autonomous driving

Dear authors,

Hi,

Mickey is a great work inspiring features representation, learning, and pose estimation.

I am wondering and trying whether I can apply it to the field of autonomous driving, e.g. visual odometry and localization. Currently, the KITTI odometry dataset (sequential images with relative camera pose) is the one that I want to use for the first try. I also saw someone mentioned training Mickey with RealEstate10K and discussed it with you. However, I did not catch the "unscaled dataset" and still don't know why translation loss is unworkable in that case. Could you give more explanation?

You have great experience and knowledge in this area. For trying the customized KITTI odometry dataset, could you give me some suggestions before I start? Is there any efficient way to revise the original dataloader?

Thanks for your time!

Sincerely, Haolin

Oct 15 '24 13:10 XJTU-Haolin

I directly exchanged DINOV2 to ResNET50（from torchvision.models）using resnet output from 1/16 downscale layers, and the accuracy looks bad. Is there anything I should pay attention?

Oct 19 '24 03:10 XJTU-Haolin

@XJTU-Haolin Hello Haolin, I also tried their algorithm on the KITTI dataset, and the accuracy was not satisfactory. After visualizing the feature points, I found that the extraction and matching of feature points were not ideal. I think this is mainly due to the insufficient resolution of features, so the positions of the feature points are not accurate, as mentioned in the limitations of their paper. However, they also mentioned, "Since MicKey predicts coordinate offsets over a coarse grid, it remains efficient while still providing correspondences with sub-pixel accuracy," which confuses me a bit.

I have attached the matching results for reference.

.

Feb 21 '25 15:02 wuchu1205

@XJTU-Haolin Hello Haolin, I also tried their algorithm on the KITTI dataset, and the accuracy was not satisfactory. After visualizing the feature points, I found that the extraction and matching of feature points were not ideal. I think this is mainly due to the insufficient resolution of features, so the positions of the feature points are not accurate, as mentioned in the limitations of their paper. However, they also mentioned, "Since MicKey predicts coordinate offsets over a coarse grid, it remains efficient while still providing correspondences with sub-pixel accuracy," which confuses me a bit.

I have attached the matching results for reference.

.

您好，请问您解决这个问题了嘛，我想用这个来做航拍图像的匹配，目前还没有尝试

Jun 19 '25 07:06 AiYoWeiYL