OREPA_CVPR2022 icon indicating copy to clipboard operation
OREPA_CVPR2022 copied to clipboard

可以解释下Proposition 1吗,没太明白,谢谢

Open swaiwi opened this issue 2 years ago • 1 comments

Proposition 1 A single-branch linear mapping, when re-parameterizing parts or all of it by over-two-layer multi-branch topologies, the entire end-to-end weight matrix will be differently optimized. If one layer of the mapping is re-parameterized to up-to-one-layer multi-branch topologies, the optimization will remain unchanged.

swaiwi avatar May 12 '22 04:05 swaiwi

Thanks for your interest swaiwi! To make things simpler, convolution can be regarded as a linear mapping over feature vectors. Let's denote the weights as w. These weights will be optimized during training by sgd or some other optimizers. If we implement w differently as a product of two weight instances w1*w2, where w1 and w2 will be optimized respectively, then it could be regarded as we re-paramerterize the conv layer into sequential two layer. But if we implement w as w1 + w2, the conv layer is re-parameterized into two parallel and one-depth layers. By Proposition1 we want to point out with what kinds of re-parameterized structures make the optimization step differently (the former case but not the latter case), which is a necessary condition for the effectiveness of re-parameterized blocks.

JUGGHM avatar May 13 '22 02:05 JUGGHM