mobilenet_v3_tflite
mobilenet_v3_tflite copied to clipboard
MobileNet V3 demo using Tensorflow 2.0 and Tensorflow Lite
MobileNet V3 Inc Minimalistic - Tensorflow 2.0 & Lite
A MobileNet V3 implementation in Tensorflow 2.0, with Tensorflow Lite (tflite) conversion & benchmarks. I created this repo as there isn't an official Keras/TF 2.0 implementation of MobileNet V3 yet and all the existing repos I looked at didn't contain minimalistic or tflite implementations, including how to use the accelerated hardswish operation provided in tflite nor profilings.
Additionally all the repos I looked at contain differences to the official MobileNet V3 implementation here. This is probably due to following early versions of the paper. Some mistakes I found:
- The stem convolution is included into the first bottleneck block
- The Squeeze-Excite module is after the depthwise activation, not before. This is important, otherwise conv & relu operations won't be fused in the tflite model
- The number of squeeze channels should be rounded up to the nearest multiple of 8
- The small model uses 1024 channels in the pooled head, not 1280
This implementation was verified by comparing the converted tflite models against the official implementation. That being said, this implementation doesn't use biases on the Squeeze-Excite blocks or the redundant average pool op in the head of the reference models and includes a softmax layer on the output.
Some further notes on my implementation & tflite:
- As dropout doesn't play nice with tflite converter, the dropout layer is only applied during training
- All dense layers are implemented as 4D convolutions. This removes the need for additional reshape ops in the tflite graph
- The tflite converter automatically identifies & replaces the hardswish operation, but only on recent versions of tensorflow
- I use the experimental MLIR based converter
Note: I strongly advise building Tensorflow from source, in order for the following to function correctly:
- Hardswish replacement
- MLIR converter
- XNNPACK
- Performance: Until recently tflite on linux only used O2 instead of O3 optimisation
Benchmarks
After tweaking the training process a bit, I was able to meet or exceed the reference accuracy for the small models:
Network | Official Top-1 Accuracy | Top-1 Accuracy |
---|---|---|
Small | 67.5 | 67.6 |
Small Minimalistic | 61.9 | 63.5 |
Weights are included in the repo.
Here are some benchmarks, including with the new XNNPACK backend for Tensorflow Lite:
Device | Small (ms) | Small Minimalistic (ms) | Large (ms) | Large Minimalistic (ms) |
---|---|---|---|---|
ODroid N2 | 22.1 | 17.6 | 70.4 | 62.1 |
Samsung Galaxy S8, CPU | 17 | 13.8 | 93 | 44.8 |
Samsung Galaxy S8, CPU, XNNPACK backend | 11.7 | 8.65 | 36.7 | 31.9 |
Samsung Galaxy S8, GPU backend | 13 | 5.16 | 12.7 | 11.3 |
I tested on 1 core over 1000 iterations, with 50 warmup iterations.
Usage Instructions
The dataloader is taken from the ResNet50 Tensorflow example dataloader here. You can prep the tfrecord files using the script here.
You can execute the script like:
python main.py --data_dir <path/to/data> --arch mobilenet_v3_small
Benchmarks were done using the script here