HRank
HRank copied to clipboard
The meaning of the "rank" of feature maps.
Thanks for your excellent work! When I try to reproduce HRank in other networks, I found an interesting result:
- When I applied torch.matrix_rank to a network with leaky_relu, it turns out that all channels almost have the same rank and the rank is always full. At first, I thought the reason is the network I used is not redundant.
- Then, I applied torch.matrix_rank to the resnet56-cifar10 network in your repository, and I found it works fine. However, if I move the torch.matrix_rank function to the BN layer (i.e. before ReLU Layer), the ranks also become the same full rank!
- Hence, I put an extra ReLU Layer after the original Leaky_ReLU in my own network, and the ranks behave normally, i.e. some channels have high ranks and other channels have low ranks. So I'm curious about two things:
- What is the meaning of low rank or even zero rank? It seems not to be related to the amount of information, and only represents the mean activation distribution.
- Is it reasonable if I calculate rank by adding an extra ReLU layer?
Zero rank denotes a zero matrix (feature map), which implies a zero filter that should be removed.
I think ReLU layer is very important in this paper since it zeroizes the negative weights and thus changes the rank of feature map.
In other words, our discovery might be limited to networks equipped with ReLU layers.
Thanks for your explanation. I think it makes sense. Still, I'll try whether it is effective by adding extra ReLUs to the original network.