CloserLookFewShot
CloserLookFewShot copied to clipboard
can't reach the accuracy as report
I have train your code with loss_type = 'dist' & train_aug(baseline++),but the 5 shot accuracy on mini-imagenet is only 56%,much lower than your report(66%),could you give a resonable explaination?
Thanks for your report, are you using the latest version? I will also check again whether I accidentally change something in the last update...
Hello, I have rerun the code and it reaches 66%. Is your training loss in the last epoch (399) be around 1.5?
Also, since 56% looks like the accuracy without data augmentation, can you check whether your commands are exactly the same as follow?
python ./train.py --dataset miniImagenet --model Conv4 --method baseline++ --train_aug python ./save_features.py --dataset miniImagenet --model Conv4 --method baseline++ --train_aug python ./test.py --dataset miniImagenet --model Conv4 --method baseline++ --train_aug
Note that the train_aug option is still required for testing since it indicates the correct model path.
Hello, I have rerun the code and it reaches 66%. Is your training loss in the last epoch (399) be around 1.5?
Also, since 56% looks like the accuracy without data augmentation, can you check whether your commands are exactly the same as follow?
python ./train.py --dataset miniImagenet --model Conv4 --method baseline++ --train_aug python ./save_features.py --dataset miniImagenet --model Conv4 --method baseline++ --train_aug python ./test.py --dataset miniImagenet --model Conv4 --method baseline++ --train_aug
Note that the train_aug option is still required for testing since it indicates the correct model path.
thx,i've found the problem is because i changed the initialization of classification weight
but can you explain why you use Weight_Normalization in baseline++?
For efficient update. Weight normalization reparameterizes the gradient into direction and length. And for baseline++, only the direction of weight is important. Thus, by weight normalization, we could directly update the direction of weights. Without weight normalization, the result of 5-shot accuracy of baseline++ in mini-Imagenet with ResNet10 backbone would be as low as ~60%. With weight normalization, it would be ~76%.
For detailed information in weight normalization, see this paper.
For efficient update. Weight normalization reparameterizes the gradient into direction and length. And for baseline++, only the direction of weight is important. Thus, by weight normalization, we could directly update the direction of weights. Without weight normalization, the result of 5-shot accuracy of baseline++ in mini-Imagenet with ResNet10 backbone would be as low as ~60%. With weight normalization, it would be ~76%.
For detailed information in weight normalization, see this paper.
but after the weight_normalization,the caculated 'cos_dist' after nn.Linear in class distLinear is larger than 1.0 or less than -1.0,which makes me feel confused.
Sorry for the late reply. This is a good observation I did not notice. The problem is in the line 'cos_dist = self.L(x_normalized)'. I intend to use forward operation to perform matrix production between x_normalized and self.L.weight.data, but I forget that when self.L belongs to WeightNorm class, its forward operation performs on self.L.weight_g, and self.L.weight_v, not self.L.weight.
Thus, the weight is not normalized, so the cos_dist is beyond the range [-1,1]. But it is approximately a scaled cos_dist.
From my observation, self.L.weight_g.data (the norm of the weight vectors) does not have a large difference among classes. Take my baseline++ model trained on miniImagenet with Conv4 backbone as the example, the self.L.weight_g.data for the first 64 classes (i.e. the class with training data) is between 20~35. So it is roughly the cos_dist scaled around 27.
I would mark this issue in my codes, thanks for finding this problem!
So if you use the correct normalization, what performance can the baseline++ method get?
Thank you.
Sorry for the late reply. This is a good observation I did not notice. The problem is in the line 'cos_dist = self.L(x_normalized)'. I intend to use forward operation to perform matrix production between x_normalized and self.L.weight.data, but I forget that when self.L belongs to WeightNorm class, its forward operation performs on self.L.weight_g, and self.L.weight_v, not self.L.weight.
Thus, the weight is not normalized, so the cos_dist is beyond the range [-1,1]. But it is approximately a scaled cos_dist.
From my observation, self.L.weight_g.data (the norm of the weight vectors) does not have a large difference among classes. Take my baseline++ model trained on miniImagenet with Conv4 backbone as the example, the self.L.weight_g.data for the first 64 classes (i.e. the class with training data) is between 20~35. So it is roughly the cos_dist scaled around 27.
I would mark this issue in my codes, thanks for finding this problem!
if you use weight_normalization,the norm the each class will be different both in train and finetune,this is quite different from your baseline++ method report in your paper.
i think your original idea is to realize the formulation of "output = cos_t * scale_factor" which here the scale factor should be the same for all the classes.that's why you use scale_factor =2 for mini-imagenet and scale_factor =10 for omniglot.I tried scale_factor =2 after revise for baseline++ and the result is very bad(as said 56% for 5-shot testing).set the scale_factor =30 will reach a similar result for conv4 network but it is still not as good as your report for ResNet18 network.
i think it a interesting things that actually learning a adaptive weight norm for each class can reach a better result
Sorry for the late reply. This is a good observation I did not notice. The problem is in the line 'cos_dist = self.L(x_normalized)'. I intend to use forward operation to perform matrix production between x_normalized and self.L.weight.data, but I forget that when self.L belongs to WeightNorm class, its forward operation performs on self.L.weight_g, and self.L.weight_v, not self.L.weight. Thus, the weight is not normalized, so the cos_dist is beyond the range [-1,1]. But it is approximately a scaled cos_dist. From my observation, self.L.weight_g.data (the norm of the weight vectors) does not have a large difference among classes. Take my baseline++ model trained on miniImagenet with Conv4 backbone as the example, the self.L.weight_g.data for the first 64 classes (i.e. the class with training data) is between 20~35. So it is roughly the cos_dist scaled around 27. I would mark this issue in my codes, thanks for finding this problem!
if you use weight_normalization,the norm the each class will be different both in train and finetune,this is quite different from your baseline++ method report in your paper.
i think your original idea is to realize the formulation of "output = cos_t * scale_factor" which here the scale factor should be the same for all the classes.that's why you use scale_factor =2 for mini-imagenet and scale_factor =10 for omniglot.I tried scale_factor =2 after revise for baseline++ and the result is very bad(as said 56% for 5-shot testing).set the scale_factor =30 will reach a similar result for conv4 network but it is still not as good as your report for ResNet18 network.
i think it a interesting things that actually learning a adaptive weight norm for each class can reach a better result
That is a very good insight! Thanks for sharing!
if you use weight_normalization,the norm the each class will be different both in train and finetune,this is quite different from your baseline++ method report in your paper.
i think your original idea is to realize the formulation of "output = cos_t * scale_factor" which here the scale factor should be the same for all the classes.that's why you use scale_factor =2 for mini-imagenet and scale_factor =10 for omniglot.I tried scale_factor =2 after revise for baseline++ and the result is very bad(as said 56% for 5-shot testing).set the scale_factor =30 will reach a similar result for conv4 network but it is still not as good as your report for ResNet18 network.
i think it a interesting things that actually learning a adaptive weight norm for each class can reach a better result
Thanks for helping reply to the problem (and sorry for late reply as being busy recently). Yes, it turns out to be different from what I meant to use, and also what I describe in the paper... I have marked the issue in the comment of the code and would release a modified paper onto the Arxiv. It would be interesting to detailedly address this class-wise normalization issue in future works.
Sorry for the late reply. This is a good observation I did not notice. The problem is in the line 'cos_dist = self.L(x_normalized)'. I intend to use forward operation to perform matrix production between x_normalized and self.L.weight.data, but I forget that when self.L belongs to WeightNorm class, its forward operation performs on self.L.weight_g, and self.L.weight_v, not self.L.weight. Thus, the weight is not normalized, so the cos_dist is beyond the range [-1,1]. But it is approximately a scaled cos_dist. From my observation, self.L.weight_g.data (the norm of the weight vectors) does not have a large difference among classes. Take my baseline++ model trained on miniImagenet with Conv4 backbone as the example, the self.L.weight_g.data for the first 64 classes (i.e. the class with training data) is between 20~35. So it is roughly the cos_dist scaled around 27. I would mark this issue in my codes, thanks for finding this problem!
hi, if weight_normed Linear function only uses g and v in the forward pass, what self.L.weight is used for? is it updated ? can it be delected directly?
Yes, with WeightNorm, self.L.weight would not be used. I have marked the code more clearly, thanks!
For efficient update. Weight normalization reparameterizes the gradient into direction and length. And for baseline++, only the direction of weight is important. Thus, by weight normalization, we could directly update the direction of weights. Without weight normalization, the result of 5-shot accuracy of baseline++ in mini-Imagenet with ResNet10 backbone would be as low as ~60%. With weight normalization, it would be ~76%. For detailed information in weight normalization, see this paper.
but after the weight_normalization,the caculated 'cos_dist' after nn.Linear in class distLinear is larger than 1.0 or less than -1.0,which makes me feel confused.
Without weight normalization, the result of 5-shot accuracy of baseline++ in mini-Imagenet with ResNet10 backbone would be as low as ~60%. But in the paper,the baseline is about 75%. I have some questions about that.
Yes, as discussed above, the number reported on the paper is actually with weight normalization. However, you can follow the parameter in issue #12 to have 75% accuracy without weight normalization. Thanks!
@KaleidoZhouYN
scale_factor
doesn't really matter in my experiments (I've tried 30~60 after having read your suggestion), where I have a large number of classes. If your loss function is cross entropy, then the scaled score should get "normalized" again through a softmax function in cross entropy. However, my backbone is a resnet101, the deeper the backbone is, the less sensitive those "factors" are in my experiments
@nicolefinnie I think the reason for that is that "WeightNorm" is applied in the latest code, so the class-wise learnable norms can actually play the role of "factors".