cnnimageretrieval-pytorch icon indicating copy to clipboard operation
cnnimageretrieval-pytorch copied to clipboard

GeM pooling parameter

Open andrefaraujo opened this issue 5 years ago • 4 comments

Hi @filipradenovic ,

For your experiment on networks with whitening learned end-to-end, with triplet loss, trained on the Google Landmarks dataset 2018: could you share to which value the GeM pooling parameter p converged to?

If you could share learning curve showing the evolution of p over the training run, that would be even better :)

Thanks!

andrefaraujo avatar Sep 05 '19 17:09 andrefaraujo

The converged p values: gl18-tl-resnet50-gem-w: 2.8180 gl18-tl-resnet101-gem-w: 2.8640 gl18-tl-resnet152-gem-w: 2.9059

I don't have evolution of the p over the training run at hand right now, if I manage, I will try to update the response with the curve.

filipradenovic avatar Sep 13 '19 22:09 filipradenovic

Thanks! No problem if you don't have the curve, I am definitely more interested in the final value.

I am training a ResNet50 with ArcFace loss, GeM pooling, whitening layer, but somehow the GeM power keeps converging to 1 (average pooling). I tried accelerating the LR (as done in your code), but it didn't really help. I guess it's hard to debug this, but if you have any thoughts on what might be wrong here please let me know :)

andrefaraujo avatar Sep 13 '19 23:09 andrefaraujo

I haven't tried training with ArcFace loss, but that should not be the problem. Maybe try the opposite, reduce the LR for p only, and observe how it changes, at some point during the training it may start going towards different values, not only towards 1. At that point, maybe you can try increasing learning rate again, or just keep it at the value where learning of p started "working".

filipradenovic avatar Sep 18 '19 03:09 filipradenovic

I plot loss vs epoch for Triplet (margin=0,5), Contrastive (margin=0,5) and ArcFace (margin=0,5, scale=1.0) losses as follows.

python -m cirtorch.examples.train ./log \
	--gpu-id '0' \
	--print-freq 1000 \
	--epochs 200 \
	--training-dataset 'retrieval-SfM-120k' \
	--test-datasets    'roxford5k,rparis6k' \
	-a 'resnet101' \
	--pool 'gem' \
	--loss 'triplet' \ # 'contrastive' \  # 'arcface' \  
	--loss-margin 0.5 \
	--optimizer 'adam' \
	-lr 5e-7 \
	--whitening \
	--neg-num 5 \
	--query-size=25 \
	--pool-size=300 \
	--batch-size 5 \
	--image-size 362

I wonder if we can conclude/generalize that ArcFace loss outperforms both Contrastive and Triplet losses in CNNs incorporated with GeM pooling layers in global feature extraction?

This statements is pointed out in paper: Unifying Deep Local and Global Features for Image Search:

Global features. For global feature learning, we adopt a suitable loss functionwith L2-normalized classifier weights W, followed by scaled softmax normalization and cross-entropy loss [59]; this is sometimes referred to as “cosine classifier”. Additionally, we adopt the ArcFace margin [11], which has shown excellent results for global feature learning by inducing smaller intraclass variance.

Triplet m=0.5: triplet_200epochs

Contrastive m=0.5: contrastive_200epochs

ArcFace m=0.5 and scale=1.0 arcface_200epochs

mrgransky avatar May 13 '21 11:05 mrgransky