pytorch-NetVlad icon indicating copy to clipboard operation
pytorch-NetVlad copied to clipboard

tokyo 24/7 performance

Open bashar-tom opened this issue 5 years ago • 18 comments

thanks for the remarkable work. have u done the experiment on the tokyo dataset? it seems that the performance is too much lower than the results in the paper.

bashar-tom avatar May 17 '19 15:05 bashar-tom

Haven't done any extensive testing on it, other than like a first couple of iterations just to see it trains. How much worse is the performance?

Nanne avatar May 20 '19 06:05 Nanne

to make things clear. I have done only a test on tokyo24/7 dataset as stated in the paper using the model provided by you. the result was somehow random "Recall@1: 0.1778". Otherwise I have got a tensorflow model which is converted from the paper author publically available matlab model, and the test results are Ok "Recall@1: 0.6762, Recall@5: 0.7905" as a last attempt, I have also converted the tensorflow model and loaded into your code, However it did not give the same result due to multiple conversion but it still performs better than the model provided by you"

bashar-tom avatar May 20 '19 10:05 bashar-tom

that's very surprising, but a clear indication something is definitely going wrong. I won't have much time to work on this, but I'll try to see if I can figure out what's going on. It might just be that the data loading for the tokyo 24/7 dataset is going wrong; see also https://github.com/Nanne/pytorch-NetVlad/issues/9

Nanne avatar May 20 '19 10:05 Nanne

I have rechecked the code and it was a small bug from me importing Tokyo dataset. I have redone the tests and the results are reasonable now but still below the performance using the model provided by the paper author. Here tests are performed on Tokyo 24/7 test set using your model: Recall@1: 0.5238 Recall@5: 0.6857 Recall@10: 0.7397 Recall@20: 0.7873

using author model: Recall@1: 0.6762 Recall@5: 0.7905 Recall@10: 0.8095 Recall@20: 0.8381

in general, the model does not generalize perfectly like the model provided by the author paper as the performance on pitts30 test set is 4% lower using your model: Recall@1: 0.8190 Recall@5: 0.9123 Recall@10: 0.9366 Recall@20: 0.9575

using author model: Recall@1: 0.8506 Recall@5: 0.9217 Recall@10: 0.9445 Recall@20: 0.9594

bashar-tom avatar May 27 '19 11:05 bashar-tom

Thanks for doing these comparisons, seems like it indeed doesn't generalise too well to the Tokyo 24/7 dataset. Previously I noticed that if I start training from the matconvnet vgg16 weights, rather than the PyTorch vgg16 weights it does a little better, but importing the weights was very hacky and error-prone, but even if training goes well seems like the base network are a quite a big factor.

As far as I've been able to tell there are no major difference in the training procedure, and I've done a decent amount of debugging on this, so perhaps its now just down to hyperparameter tuning to boost to performance.

Nanne avatar May 28 '19 10:05 Nanne

Thanks for your answers. I really couldn't figure it out why it is not generalizing well. and it seems it's more related to the training process. I have trained a new model using Resnet18 pretrained on places356 https://github.com/CSAILVision/places365 and used the features from the Conv4_x block. the model learns perfectly the pitts30 dataset and achieved the following on val_set: Recall@1: 0.8644 Recall@5: 0.9494 Recall@10: 0.9694 Recall@20: 0.9807 and this configuration is 5 times faster than Vgg16 configuration but it stills doesn't generalize well on tokyo24/7 test set dataset and achieved : Recall@1: 0.3206 Recall@5: 0.4476 Recall@10: 0.4825 Recall@20: 0.5206

I hope that you could give me a hint on how to start debugging this issue. Thanks in advance

bashar-tom avatar May 29 '19 14:05 bashar-tom

Debugging why something doesn't generalise to another dataset is quite tricky, but perhaps you can try some checkpoints from earlier in the training process, and see if it overfits to the Pittsburgh dataset.

What is the performance of your pre-trained Resnet18 on tokyo24/7 after just 1 epoch on the Pittsburgh set?

Nanne avatar Jun 02 '19 07:06 Nanne

Thanks for your concern about my problem. I know that it is a tricky problem and I hope that we could find a solution. The performance on val_set with Epoch 1 is : Pitts30_val: Recall@1: 0.8084 Recall@5: 0.9188 Recall@10: 0.9468 Recall@20: 0.9678

and on Tokyo 24/7: Recall@1: 0.2857 Recall@5: 0.4635 Recall@10: 0.5079 Recall@20: 0.5683

Also, the performance on test_set is always 3 to 4% lower than the val_set which is normal is it?

At the end, I did the test on TokyoTM_val_set which gave Resnet_18 Tokyo_TM_val Recall@1: 0.9939 Recall@5: 0.9989 Recall@10: 0.9993 Recall@20: 0.9996 which makes sense when it is compared to the evaluation using the pretrained model on VGG16 provided by you VGG16 Tokyo_TM_val: Recall@1: 0.9695 Recall@5: 0.9949 Recall@10: 0.9971 Recall@20: 0.9986

bashar-tom avatar Jun 03 '19 15:06 bashar-tom

I saw similar performance differences between val and test, so seems to be normal yes.

Performance after 1 epoch seems similar to what is reported in the netvlad paper for their off-the-shelf networks, so I assume that during training the features become 'too specific' to the Pittsburgh dataset, not sure how the model from the original authors doesnt suffer from this. Assuming you are comparing models which were all trained on the Pittsburgh set only, right?

Nanne avatar Jun 05 '19 12:06 Nanne

Thank you for your answer and sorry for being late. yes, all the models were trained on Pittsburgh. I also have tested the performance on other datasets from this paper

It turns out the model provided by the author also performed far better than the model provided by you. I know that is not easy to figure out but I will appreciate any hint.

bashar-tom avatar Jul 16 '19 14:07 bashar-tom

The Tensorflow repo you linked appears to re-use the original MATLAB models, so I think the problem is essentially the same one; following the same training procedure doesn't give the same result at the end.

One important factor in the whole pipeline is the base model that is used, and the pre-trained weights its initialized with, I guess you could explore using the matconvnet weights to initialize your model and then train ontop of that.

Nanne avatar Jul 17 '19 11:07 Nanne

@bashar-tom Hi, have you solved this issue? I also met this problem. Thanks!

yxgeee avatar Oct 11 '19 06:10 yxgeee

I achieved the performance of

Recall Scores:                                                                                           
  top-1          86.1%                                                                                     
  top-5          93.0%                                                                             
  top-10         95.0%

by training from conv3 of vgg16 with learning rate of 0.0001 and applying PCA+whitening followed by L2 normalization (as the original paper introduced) in the inference.

yxgeee avatar Oct 14 '19 09:10 yxgeee

I achieved the performance of

Recall Scores:                                                                                           
  top-1          86.1%                                                                                     
  top-5          93.0%                                                                             
  top-10         95.0%

by training from conv3 of vgg16 with learning rate of 0.0001 and applying PCA+whitening followed by L2 normalization (as the original paper introduced) in the inference.

training and testing on the pitts-30k dataset.

yxgeee avatar Oct 14 '19 09:10 yxgeee

Hi @yxgeee, I have trained the model using the ResNet18 and get the features from the output of the block4 and set all layers to trainable. I have got the best performance when I calculated the PCA offline once and used the generated U matrix and the Mean of the features as a Conv layer, where the weights of the layer are equal to the U matrix and the bias is the Mean. Using the features extracted from the block4 using the Pitts30k trainset

bashar-tom avatar Oct 15 '19 16:10 bashar-tom

Hi @yxgeee, I have trained the model using the ResNet18 and get the features from the output of the block4 and set all layers to trainable. I have got the best performance when I calculated the PCA offline once and used the generated U matrix and the Mean of the features as a Conv layer, where the weights of the layer are equal to the U matrix and the bias is the Mean. Using the features extracted from the block4 using the Pitts30k trainset

Hi @bashar-tom , What about your optimizer settings? Thanks a lot!

yxgeee avatar Oct 16 '19 04:10 yxgeee

I have rechecked the code and it was a small bug from me importing Tokyo dataset. I have redone the tests and the results are reasonable now but still below the performance using the model provided by the paper author. Here tests are performed on Tokyo 24/7 test set using your model: Recall@1: 0.5238 Recall@5: 0.6857 Recall@10: 0.7397 Recall@20: 0.7873

using author model: Recall@1: 0.6762 Recall@5: 0.7905 Recall@10: 0.8095 Recall@20: 0.8381

in general, the model does not generalize perfectly like the model provided by the author paper as the performance on pitts30 test set is 4% lower using your model: Recall@1: 0.8190 Recall@5: 0.9123 Recall@10: 0.9366 Recall@20: 0.9575

using author model: Recall@1: 0.8506 Recall@5: 0.9217 Recall@10: 0.9445 Recall@20: 0.9594

@bashar-tom Did you use all the dbs and query of tokyoTM_train? Can you share the trained model, please.

Ahmedest61 avatar Feb 06 '21 16:02 Ahmedest61

Thanks for your answers. I really couldn't figure it out why it is not generalizing well. and it seems it's more related to the training process. I have trained a new model using Resnet18 pretrained on places356 https://github.com/CSAILVision/places365 and used the features from the Conv4_x block. the model learns perfectly the pitts30 dataset and achieved the following on val_set: Recall@1: 0.8644 Recall@5: 0.9494 Recall@10: 0.9694 Recall@20: 0.9807 and this configuration is 5 times faster than Vgg16 configuration but it stills doesn't generalize well on tokyo24/7 test set dataset and achieved : Recall@1: 0.3206 Recall@5: 0.4476 Recall@10: 0.4825 Recall@20: 0.5206

I hope that you could give me a hint on how to start debugging this issue. Thanks in advance

@bashar-tom,Hi,I met some problem on the training item using resnet18 as pretrained network ,can you share your project about this ,thank u very much

bo1234566 avatar Mar 03 '21 14:03 bo1234566