Patch-NetVLAD icon indicating copy to clipboard operation
Patch-NetVLAD copied to clipboard

Can't reproduce Robotcar results

Open RuotongWANG opened this issue 3 years ago • 11 comments

Hi, I tried to reproduce your result on Robotcar Seasons V2 test set by submitting to the challenge submission server. I used the released performance-focused model which is pre-trained on MSLS dataset, but I got this incorrect result: image And I tried the model pre-trained on Pitts30k, the results are not correct either. image Besides, the results on other datasets is normal. Is the model version that I used is wrong? Could you possibly release the model state that achieves the results on Robotcat dataset shown in the paper? Or would you provide the results on test set split by conditions like the Supplementary Table 1? Thank you so much.

Best regards,

RuotongWANG avatar Nov 21 '21 10:11 RuotongWANG

Hi,

Could you please let us know the complete process that you used to obtain these results? In particular, how you map the best match to a pose?

Best, Tobias

Tobias-Fischer avatar Nov 21 '21 20:11 Tobias-Fischer

I directly used the pose of the best matched reference image as the estimated pose of the query. And I have also evaluated the SuperGlue method with the same procedure and got a normal result: image So I think there might be something wrong with the configuration or the model state that I used.

RuotongWANG avatar Nov 22 '21 01:11 RuotongWANG

Ok - @StephenHausler - let's sit together at some point to find where the culprit lies.

Tobias-Fischer avatar Nov 22 '21 03:11 Tobias-Fischer

Hi, @StephenHausler, @Tobias-Fischer , some days ago I ran the Pittsburgh_WPCA4096 and MSLS_WPCA4096 models for RobotSeasons and obtained the following results with the NetVLAD retrieval: Pittsburgh_WPCA4096: day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> overall 5.9 23.3 73.9 In the paper you report for NetVLAD: 7.0 24.9 76.6

MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> overall 5.0 18.58 67.8

The overall I calculate it by doing the weighted mean of both numbers based on the number of images taken at day and night: overall = ( day * 9300 + night * 2634 ) / (9300 + 2634)

For the Pittsburgh model the difference with the reported numbers seems reasonable to me (like what would happen between two different trainings), so I think that the model is probably fine and the problem lies in the Patch-NetVLAD feature extraction part. I hope this info helps with the issue.

marialeyvallina avatar Nov 25 '21 13:11 marialeyvallina

Hi, @StephenHausler, @Tobias-Fischer , some days ago I ran the Pittsburgh_WPCA4096 and MSLS_WPCA4096 models for RobotSeasons and obtained the following results with the NetVLAD retrieval: Pittsburgh_WPCA4096: day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> overall 5.9 23.3 73.9 In the paper you report for NetVLAD: 7.0 24.9 76.6

MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> overall 5.0 18.58 67.8

The overall I calculate it by doing the weighted mean of both numbers based on the number of images taken at day and night: overall = ( day * 9300 + night * 2634 ) / (9300 + 2634)

For the Pittsburgh model the difference with the reported numbers seems reasonable to me (like what would happen between two different trainings), so I think that the model is probably fine and the problem lies in the Patch-NetVLAD feature extraction part. I hope this info helps with the issue.

Hi, could you please tell me the dataset you ran Pittsburgh_WPCA4096 model is RobotSeasons V1 or V2?

HeartbreakSurvivor avatar Dec 12 '21 08:12 HeartbreakSurvivor

Hi @HeartbreakSurvivor, I ran RobotSeasons V2

marialeyvallina avatar Dec 13 '21 15:12 marialeyvallina

Hi @HeartbreakSurvivor, I ran RobotSeasons V2

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

overall = ( day * 9300 + night * 2634 ) / (9300 + 2634)

I dont't know why, but it doesn't matter.

What I really wonder is that how you get these result? just follow the QuickStart in ReadMe.md file? I alos ran the Pittsburgh_WPCA4096 model on RobotCar Seasons V2 but got wrong result and don't know why. I just run the feature_extract.py, feature_match .py to get the 'PatchNetVLAD_predictions.txt' and just get pose of the best matched database image as estimated pose for each query image. And submit result to benchmark website but got wrong answers. So I hope you could tell me how you obtained your results which seems reasonable, Thanks.

HeartbreakSurvivor avatar Dec 15 '21 06:12 HeartbreakSurvivor

Hi again @HeartbreakSurvivor

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

Thank you very much for pointing this out, it seems that I indeed mixed the two versions. The overall should be instead calculated as: overall = (day * 1443 +night * 429)/(1443+429) The distribution is very similar between v1 and v2 so the results do not change much: For Pittsburgh_WPCA4096 day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> 5.8 | 23.1 | 73.2 For MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> 4.8 | 17.9 | 65.3

I use indeed feature_extract.py and feature_match.py and then use the NetVLAD_predictions.txt file (I have not evaluated Patch-NetVLAD yet, only NetVLAD). You have to be careful with the format of the poses, as explained in the dataset readme, but the retrieval itself should be fine.

marialeyvallina avatar Dec 15 '21 09:12 marialeyvallina

Hi again @HeartbreakSurvivor

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

Thank you very much for pointing this out, it seems that I indeed mixed the two versions. The overall should be instead calculated as: overall = (day * 1443 +night * 429)/(1443+429) The distribution is very similar between v1 and v2 so the results do not change much: For Pittsburgh_WPCA4096 day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> 5.8 | 23.1 | 73.2 For MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> 4.8 | 17.9 | 65.3

I use indeed feature_extract.py and feature_match.py and then use the NetVLAD_predictions.txt file (I have not evaluated Patch-NetVLAD yet, only NetVLAD). You have to be careful with the format of the poses, as explained in the dataset readme, but the retrieval itself should be fine.

thank you very much for the reply, I will check my code.

HeartbreakSurvivor avatar Dec 15 '21 14:12 HeartbreakSurvivor

Hi again @marialeyvallina it seems that I got the same problem with you. I ran the Pittsburgh_WPCA4096 model for RobotSeasons V1 and obtained the following results with the NetVLAD retrieval:

day all night all
6.3 / 25.4 / 87.6 0.8 / 2.5 / 16.5

which seems reasonable to me. But when I use the PatchNetvlad retrieval, the result seems wrong.

day all night all
2.1 / 8.3 / 36.7 0.1 / 1.3 / 13.9

I have test Pittsburgh_WPCA4096 on RobotCar Seasons V1 for twice just in case, but got the same result, the result is as follows. image

So I agree with your point, the problem maybe lies in PathchNetvlad feature extraction or feature match part. Hi, @Tobias-Fischer, any hints about this issue?Or did you test RobotCar Seasons V1 dataset, if so, could you please provide the test result?

HeartbreakSurvivor avatar Dec 17 '21 04:12 HeartbreakSurvivor

Hi, @StephenHausler and I will be looking at this. However the holiday season is coming up and we're tied with other projects.

We haven't ever checked V1 as far as I remember.

I'm assuming you guys are aware that the lower scores are better for NetVLAD (distances), but higher scores for Patch-NetVLAD (number of inliers)? So it needs an argmax instead of argmin to get the top1 match.

Tobias-Fischer avatar Dec 17 '21 05:12 Tobias-Fischer