MSDNet
MSDNet copied to clipboard
Pruning logic
According to the paper,
One simple strategy to reduce the size of the network is by splitting it into S blocks along the depth dimension, and only keeping the coarsest (S - i + 1) scales in the ith block.
How is the network split it into S blocks? According to the pruning logic in JointTrainContainer.lua,
elseif opt.prune == 'max' then
local interval = torch.ceil(layer_all/opt.nScales)
inScales = opt.nScales - torch.floor((math.max(0, layer_tillnow -2))/interval)
outScales = opt.nScales - torch.floor((layer_tillnow -1)/interval)
Consider a toy example with 4 blocks, linearly increasing span, step = 1, base = 1 and a maximum of 3 scales. The number of layers with input scales 3, 2 and 1 are 5, 4 and 1 respectively. Why is the distribution of the number of layers in each split uneven?