iFormer
iFormer copied to clipboard
How to understand the channel ratio?
As mentioned in your article,Ch+Cl=C ; the channel ratio is (Ch)/C+(Cl)/C=1; If you want to decrease the ratio of high frequency channels, is that actually decreasing the number of high frequency channels? From the formula, because the denominator is the same, the ratio of the corresponding part will change only if you change the numerator. I don't know if I'm getting this right,could you give me some advice? Thank you.
I want to share what I understand about this. You can refer to Table 7 in appendix A.3 in the paper.
For instance with iFormer-S, each stage stacks iFormer blocks in [3, 3, 9, 3] manner. Before feeding input features to blocks in each stage, patch embedding layer is processed for tokenization and there is a notation about channel size.
Seeing stage 1, the patch embedding layer returns feature maps with channel size 96 (C=96). Then, actually, I think there is a typo
, $C_h / h$ and $C_l / h$ should be changed in $C_h / C$ and $C_l / C$. For now, you could understand we should divide C=96 into 3 portions, and the high-frequency path takes 64-channel by $C_h / C = 2/3$. You can repeat the same process for the other stages.
[NOTICE] the stacked iFormer block in each stage should persist channel size in the same which is defined in each stage; (i.e.) [96, 192, 320, 384] for iFormer-S. This rule is applied to any architecture which follows a hierarchical Transformer manner.
(p.s) As I'm not a native, I'm sorry for my poor explanation in English. And if there is some wrong, please don't hesitate to give me any feedback.
I want to know how to get the C_h /h in the blocks of the four stage ? by grid search ?