nn_pruning Is this pruning methods common for multiHeadAttention

Is this pruning methods common for multiHeadAttention

Open alphaRGB opened this issue 2 years ago • 0 comments

As the document tested the BERT models and got good result, one question is this nn_pruning methods can be applied to other Transformer models, like Google ViT, Swin Transformer and so on.

Jun 17 '22 07:06 alphaRGB

nn_pruning nn_pruning copied to clipboard

Is this pruning methods common for multiHeadAttention

nn_pruning
nn_pruning copied to clipboard