GHM_Detection
GHM_Detection copied to clipboard
Why use nonempty bins rather than all bins?
Why you divide weights
by nonempty bins (n
) rather than all bins(self.bins
)?https://github.com/libuyu/GHM_Detection/blob/3647287710416c91077805d504349fb947c2e9bd/mmdetection/mmdet/core/loss/ghm_loss.py#L54
I think M
is the amount of all bins in the paper. Am I missing something?
@DHPO You are right. In the paper, we define the M as the number of all bins. And in the latest version of our code, we choose the number of valid (non-empty) bins.
Suppose that you have 100 bins, and all the examples have the same gradient norm of 0.8 (although this is impossible in practice). Then each example will get a harmonizing parameter of 1/100 according to the original equation. And when the bin number is 10000, the parameter will become 1/10000. But in these cases, we would like to use a harmonizing parameter of 1 for all examples since they should be equally treated and should not be down-weighted. And the harmonizing parameters should not depend on the bin numbers. So we think the number of valid bins is more reasonable.
Thank you for reading the code and paper so carefully.