liyunlu0618 comments

Results 11 comments of


                                            liyunlu0618

Sparsity Runtime Integration with TF/TFLite for Latency Improvements

Sorry for keeping you waiting. We're actively working on making the initial release of sparse inference support in TFLite. It's hard to give an exact date but hopefully before Q3...

Sparsity Runtime Integration with TF/TFLite for Latency Improvements

A spoiler: https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/sparsity/keras/mnist/mnist_e2e.py Please note that we're still finalizing the API. The workflow in the released version may look different.

Sparsity Runtime Integration with TF/TFLite for Latency Improvements

For the Conv op we only support these hosted models at the moment: https://github.com/google-research/google-research/tree/master/fastconvnets We need the block config to use SIMD instructions on Arm neon architecture. Feel free to...

Sparsity Runtime Integration with TF/TFLite for Latency Improvements

This is currently available as an experimental feature in TFLite. For sparse CNNs, it needs to run with the XNNPack delegate. Please refer to [this](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/delegates/xnnpack). For sparse RNNs and transformers,...

Pruning: Keras subclassed model increased support

We recently added support for pruning nested models, see this [PR](https://github.com/tensorflow/model-optimization/pull/658). For subclass models, since keras doesn't support cloning, we still don't have a model-level API. You can still re-construct...

Pruning does not reduce inference time.

Using quantization instead is definitely an alternative solution. You can also check out this blogpost: https://ai.googleblog.com/2021/03/accelerating-neural-networks-on-mobile.html For CNN models, you can use pruning to train the model and deploy it...

liyunlu0618

Sparsity Runtime Integration with TF/TFLite for Latency Improvements

Sparsity Runtime Integration with TF/TFLite for Latency Improvements

Sparsity Runtime Integration with TF/TFLite for Latency Improvements

Sparsity Runtime Integration with TF/TFLite for Latency Improvements

Pruning: Keras subclassed model increased support

Pruning does not reduce inference time.

Pruning models from tf.keras.applications (efficientnet, resnet etc)

Pruning models from tf.keras.applications (efficientnet, resnet etc)

Pruning models from tf.keras.applications (efficientnet, resnet etc)

Pruning models from tf.keras.applications (efficientnet, resnet etc)