RePr icon indicating copy to clipboard operation
RePr copied to clipboard

Question about reinitialize

Open philokey opened this issue 5 years ago • 6 comments

Hi, I feel a little confused about reinitialize. You have set W.data[mask[name]] to zero in pruning, however in reinitialize, you don't recover the corresponding weight to the dropped filter. In the paper, it said "we reinitialize the filters to be orthogonal to its value before being dropped". I think it is not much reasonable, can you give me the reason that why did you implement like this? Thank you very much.

philokey avatar Mar 25 '19 04:03 philokey

I think the filters been pruned should restore previous value that before it been pruned, and keep other non-pruned filters their current value, then use null_space = self.qr_find_null(W2d.cpu().detach().numpy()) to find null_space orthogonal to both.

zeng-hello-world avatar Mar 25 '19 08:03 zeng-hello-world

By the way, https://github.com/siahuat0727/RePr/blob/master/main.py#L190 it seems that the operation doesn't mentioned in the paper.

philokey avatar Mar 25 '19 09:03 philokey

@philokey That's my mistake, thanks a lot! For the channels initialization part, I have tried to initialize it randomly or with all zeros, and the former looks better. I also consulted the author for this part and will update if there is any result.

siahuat0727 avatar Mar 25 '19 09:03 siahuat0727

Another question: what if all filters of a layer are pruned?(this could happen sometime)
This will lead to the data forward through this layer all come out with zeros...

zeng-hello-world avatar Mar 25 '19 11:03 zeng-hello-world

I have 2 questions. 1st, the architecture of your vaniila network. If it's trained from scratch following the standard way, the network should be already overfitting and the test curve is much different from what was reported in the paper. 2nd, in reinitialization step. In your code, you wrote null_space = qr_null(W2d if drop_filters[name] is None else np.vstack((drop_filters[name], W2d))), where W2d is the original full filters except that part of its filters were pruned (if any) and the other were used as the filters of the sub-network. But np.vstack seems to be wrong as it already included the whole matrix. 3rd, thank you for your contribution.

tiandunx avatar Apr 30 '19 07:04 tiandunx

@tiandunx Hi, thanks for checking.

  1. I think I didn't get it. I also wonder what's the difference between my experiment settings and the paper's.

  2. If I understand correctly, it only affects the efficiency of the code since the rows with all zeros may not affect the result of null space:

$ cat test.py
import numpy as np
from utils import qr_null

whole = np.random.randn(3, 4)
whole[0] = 0
print(whole)
print(qr_null(whole))
print()

sub = whole[1:]
print(sub)
print(qr_null(sub))
print()

print(np.array_equal(qr_null(whole), qr_null(sub)))

$ python test.py
[[ 0.          0.          0.          0.        ]
 [-1.93893849  0.17246339  0.40822182  0.45453628]
 [ 0.59895742 -1.22694375  0.70782981 -0.37624858]]
[[ 0.22369862  0.21604992]
 [ 0.56086785 -0.17220338]
 [ 0.79668953  0.02931069]
 [ 0.02592232  0.96062964]]

[[-1.93893849  0.17246339  0.40822182  0.45453628]
 [ 0.59895742 -1.22694375  0.70782981 -0.37624858]]
[[ 0.22369862  0.21604992]
 [ 0.56086785 -0.17220338]
 [ 0.79668953  0.02931069]
 [ 0.02592232  0.96062964]]

True

siahuat0727 avatar May 01 '19 08:05 siahuat0727