glDelegateBench
glDelegateBench copied to clipboard
quick and dirty inference time benchmark for TFLite gles delegate
glDelegateBench
quick and dirty inference time benchmark for TFLite gles delegate
The TensorFlow team announced TFLite GPU delegate and published related docs [2][3] in Jan 2019. But except Mobilenet V1 classifier, there is no publicly available app to evaluate it, so I wrote a quick and dirty app to evaluate other models.
For the 4 public models mentioned in [1], I got the following numbers on Pixel 2.
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 150 | 75 | 21 |
PoseNet | 183 | 96 | 40 |
DeepLab V3 | 219 | 131 | 91 |
Mobilenet SSD V2 COCO | 264 | 158 | 49 |
On Xiaomi Mi 9, I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 39 | 35 | 15 |
PoseNet | 48 | 47 | 19 |
DeepLab V3 | 61 | 64 | 65 |
Mobilenet SSD V2 COCO | 69 | 75 | 36 |
On Pixel 3a, I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 113 | 80 | 52 |
PoseNet | 138 | 96 | 78 |
DeepLab V3 | 173 | 132 | 144 |
Mobilenet SSD V2 COCO | 200 | 167 | 113 |
Check https://github.com/freedomtan/glDelegateBenchmark/ for iOS code
add a local_tflite_aar
branch to test ruy, the new TFLite CPU backend
on Pixel 2, I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 117 | 37 | 20 |
PoseNet | 140 | 47 | 39 |
DeepLab V3 | 177 | 72 | 122 |
Mobilenet SSD V2 COCO | 202 | 75 | 60 |
on Pixel 3a, I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 107 | 44 | 51 |
PoseNet | 131 | 57 | 77 |
DeepLab V3 | 164 | 82 | 145 |
Mobilenet SSD V2 COCO | 184 | 86 | 113 |
Update Oct 31, 2019. Nightly aar binaries are with ruy and OpenCL backend
Update Dec 8, 2019, Dec for Pixel 3a came with DSP and GPU NNAPI 1.2 driver, so we can have NNAPI numbers on Pixel 3a
on Pixel 2 (w/ libOpenCL-pixel.so from Pixel 3), I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU OpenCL (ms) | GPU GL Compute Shader (ms) |
---|---|---|---|---|
Mobilenet | 118 | 34 | 10 | 21 |
PoseNet | 142 | 43 | 14 | 41 |
DeepLab V3 | 174 | 75 | 21 | 69 |
Mobilenet SSD V2 COCO | 202 | 73 | 18 | 48 |
on Pixel 3a, I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) | NNPAI (ms) |
---|---|---|---|---|
Mobilenet | 107 | 44 | 28 | 25 |
PoseNet | 131 | 57 | 38 | 32 |
DeepLab V3 | 164 | 82 | 60 | 186 |
Mobilenet SSD V2 COCO | 184 | 86 | 54 | 249 |
on Pixel 4, I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU Delegate (ms) | NNAPI (ms) |
---|---|---|---|---|
Mobilenet | 42 | 13 | 8 | 7 |
PoseNet | 52 | 15 | 11 | 11 |
DeepLab V3 | 66 | 25 | 20 | 98 |
Mobilenet SSD V2 COCO | 70 | 24 | 16 | 86 |
[1] https://medium.com/tensorflow/tensorflow-lite-now-faster-with-mobile-gpus-developer-preview-e15797e6dee7
[2] https://www.tensorflow.org/lite/performance/gpu
[3] https://www.tensorflow.org/lite/performance/gpu_advanced