Adaptive-Attention
Adaptive-Attention copied to clipboard
Some unexplained parameters in your code
Excellent work, but the code is a bit messy, it's not friendly to a novice like me. Can you explain the meaning of some key parameters in the code? Such as "-d, double", "use_perm" and "use_inter_class". What are the differences between the "weighted" and "unweighted"? There are many details in the code that are not covered in the paper :)
Looking forward to your reply!
Sure, this work is done a very long time ago, and I'll take some time to clean the code in the future. Some ablation experiments are not included in our paper. For your questions
- For "-d double", we ablate to use two reweighting module, one for generating the spatial attention map and one for classification.
- For "use_permute" as indicated in paper Section 3.3, we use the symmetric form of the function, which means we also "permute" the role of query and reference and compute a set of new scores. Then two scores are merged together for the final prediction. You can regard it as kind of model ensemble.
- For "inter_class_loss", you can refer to https://github.com/zihangJiang/Adaptive-Attention/blob/45eeb8fd629a81eebb3c8a8b869551f4f8738325/src/cfr_loss.py#L147-L148 which is the loss with regard to the reweighting weight. The aim is to force the Meta-weight of the same class also to be similar.
- "weighted" means to use our reweighting strategy as explained in Section 3.2 in our paper.
Hope these answers your questions.