CoFiPruning icon indicating copy to clipboard operation
CoFiPruning copied to clipboard

About the diag() and distillation in your paper

Open CaffreyR opened this issue 1 year ago • 5 comments

Hi @xiamengzhou , many thanks to your contribution. I have small questions in your paper, in your paper you said that

FNN pruning introduce a Zint

And in your paper there is a Eq, but what is diag, why do we have to put Zint into a diagonal matrix? Do diag(Zint) is df*df size?

image

And you also says that

Coarse-grained and Fine- grained units (§3.1) with a layerwise distillation objective transferring knowledge from unpruned to pruned models (§3.2)

However, distilling intermediate layers during the pruning process is challenging as the model struc- ture changes throughout training. (previous method)

So are we pruning a student model during distillation?

image

Many thanks!!

CaffreyR avatar Sep 18 '22 14:09 CaffreyR

Hi,

Thanks for reaching out!

For your first question, multiplying the representations with diag(z_int) essentially multiplies the output dimension of the representations with the corresponding mask. We use diag(z_int) as a matrix notation.

For your second question, yes! CoFi pruning prunes a student model with a distillation objective.

Feel free to reach out again if you have more questions :)

xiamengzhou avatar Sep 19 '22 19:09 xiamengzhou

Hi, so diag is a diagonal matrix with the zint on its diagonal line?

CaffreyR avatar Sep 20 '22 01:09 CaffreyR

Yes!

xiamengzhou avatar Sep 20 '22 01:09 xiamengzhou

Thanks! So why it is have to be a diagonal matrix? Can a non-diagonal matrix replace it as long as the non-diagonal matrix represents the corresponding mask?

CaffreyR avatar Sep 20 '22 14:09 CaffreyR

Yes, it can! We use diag in our paper for mathematical correctness.

xiamengzhou avatar Sep 20 '22 14:09 xiamengzhou

Hi, I am closing this issue now! Feel free to reopen it if you have more questions :)

xiamengzhou avatar Nov 01 '22 19:11 xiamengzhou