model-optimization
model-optimization copied to clipboard
Structural (filter) pruning for convolutional layers
System information
- TensorFlow version (you are using): 2.5.0
- Are you willing to contribute it (Yes/No): Yes
Motivation
Deciding on where to have high filter/channel counts in convnets can be difficult, and smarter reductions in these numbers can lead to faster inference time across all devices.
Pruning is currently not very useful on GPU, since sparse operations are much slower than dense operations, so it would be useful to have a method of pruning that results in a reduced dense representation.
The current implementation I have (that isn't finished) doesn't require many additional components, since it works similarly to block sparsity and can reuse much of this code.
Describe the feature
Add an option to prune_low_magnitude
for "filter pruning" (alternatively "structural pruning") that restricts pruning of supported layers to blocks of the weights at a time. For convolutional layers these blocks represent the output channels of the layer.
In addition, an option is added to strip_pruning
to restructure the layers that have been pruned in this manner, with fewer output channels than the original layers. The change in shape needs to be propagated forwards to future layers.
Describe how the feature helps achieve the use case With these two additions, models can be pruned in a way that is meaningful when running on GPU, saving memory and compute. It is also possible to find a reasonable layout for the number of output channels in each layer without hyperparameter tuning.
This feature makes pruning useful on GPU, where it currently is not so useful.
Describe how existing APIs don't satisfy your use case
Using tfmot.python.core.sparsity.keras.prune.prune_low_magnitude
on a convolutional layer will consider each element of the weights variable on its own, and very rarely leads to pruning that can be useful for reducing inference time on the GPU.
In addition, tfmot.python.core.sparsity.keras.prune.strip_pruning
will always leave weights with zeros in them, even if a reduction in the size of the layer would be beneficial. If the outputs of a filter in the kernel of a convolutional layer are all zero, strip_pruning
will leave restructuring as a step for the runtime.
Thanks for your interest in contribution!
Please read contribution instructions to take further steps. As this looks like a whole new feature, you also might want to file an RFC
Thanks for your interest in contribution!
Please read contribution instructions to take further steps. As this looks like a whole new feature, you also might want to file an RFC
Okay, I'm currently creating an RFC and finishing up my proposal. Can I use you or @Xhark as sponsor for the RFC?
Hello I want structured pruning. But currently tfmot.python.core.sparsity.keras.prune.prune_low_magnitude seems to be using the unstructured pruning method. When is structured pruning applied?
Hello, any updates on the topic ? Thank you
Hello, any updates on this topic?