CoFiPruning
CoFiPruning copied to clipboard
About the diag() and distillation in your paper
Hi @xiamengzhou , many thanks to your contribution. I have small questions in your paper, in your paper you said that
FNN pruning introduce a Zint
And in your paper there is a Eq, but what is diag
, why do we have to put Zint
into a diagonal matrix? Do diag(Zint)
is df*df
size?
![image](https://user-images.githubusercontent.com/84232793/190911165-7c5dca85-4fcb-494d-b195-0f30c7f06685.png)
And you also says that
Coarse-grained and Fine- grained units (§3.1) with a layerwise distillation objective transferring knowledge from unpruned to pruned models (§3.2)
However, distilling intermediate layers during the pruning process is challenging as the model struc- ture changes throughout training. (previous method)
So are we pruning a student model during distillation?
![image](https://user-images.githubusercontent.com/84232793/191038498-2e506fb5-321d-472b-a245-0614bd9d0b7d.png)
Many thanks!!
Hi,
Thanks for reaching out!
For your first question, multiplying the representations with diag(z_int) essentially multiplies the output dimension of the representations with the corresponding mask. We use diag(z_int) as a matrix notation.
For your second question, yes! CoFi pruning prunes a student model with a distillation objective.
Feel free to reach out again if you have more questions :)
Hi, so diag
is a diagonal matrix with the zint
on its diagonal line?
Yes!
Thanks! So why it is have to be a diagonal matrix? Can a non-diagonal matrix replace it as long as the non-diagonal matrix represents the corresponding mask?
Yes, it can! We use diag in our paper for mathematical correctness.
Hi, I am closing this issue now! Feel free to reopen it if you have more questions :)