nn_pruning
nn_pruning copied to clipboard
Why you say it's not needed to run the models pruned by the nn_pruning tools?
In the README.md, why did you say that "it's not needed to run the models pruned by the nn_pruning tools"?
The nn_pruning tool remove entire heads in attention and entire rows/columns in feed forward networks. The remaining heads are then pretty dense, and the feed forward networks are completely dense after row/column removal. That means that pytorch_block_sparse is not fast enough for this slightly sparse network to be competitive with very efficient standard dense linear algebra kernels: there are not enough zeros for pytorch_block_sparse to be competitive, so just using standard pytorch functions is faster.