iPhone TOPS estimation
I developed a simple program that utilizes Apple’s Metal Performance Shaders Graph (MPSGraph) to measure and estimate the compute capacities of Apple Neural Engine (ANE). The program can be found on GitHub at https://github.com/freedomtan/measure_ane_capacity. By using this program, I obtained numbers that are quite close to what Apple claimed.
Based on my observations, I believe that to achieve better performance, we may need to convert TensorFlow (TF) or PyTorch models to MPSGraph (instead of CoreML).
Why we cannot get these numbers with Core ML tools? It seems, for either the palette or linear quantization schemes, the models are actually using float16 as we knew before.