mobile_app_open
mobile_app_open copied to clipboard
Check Mobilenet V4 Large on iPhones
Currently, I got
| device | Mobilenet V4 Large | Mobilenet EdgeTPU |
|---|---|---|
| iPhone 13 | 220.11 | 617.78 |
| iPhone 14 Pro | 300.06 | 970.95 |
| iPhone 15 Pro Max | 332.95 | 1145.05 |
Roughly, > 300 qps for iPhone 13 should be possible.
https://github.com/mlcommons/mobile_app_open/pull/821#issuecomment-1976609360
@freedomtan please share the info how to check the model accuracy for the Mobilenet V4. What dataset do I need to use, and if we have some specific steps to setup accuracy test on the iOS device. Thanks
@freedomtan please share the info how to check the model accuracy for the Mobilenet V4. What dataset do I need to use, and if we have some specific steps to setup accuracy test on the iOS device. Thanks
To validate accuracy of image classification models, we use full ImageNet 2012 validation dataset (50, 000 images) from https://www.image-net.org/index.php.
@freedomtan I've tried accuracy test for the CoreML backend, and TF backend for the Image Classification task v1, and v2. For each case it crashes after 100%. EXC_BAD_ACCESS (code=1, address=0x27c8) in compute accuracy. I'm going to check what the problem we have.
@freedomtan I've tried accuracy test for the CoreML backend, and TF backend for the Image Classification task v1, and v2. For each case it crashes after 100%. EXC_BAD_ACCESS (code=1, address=0x27c8) in compute accuracy. I'm going to check what the problem we have.
I've found that validation results were expected in another format, that I had (so only the category number, without image name). I can run the accuracy test now, but it gives 0.05% of the accuracy, so might be again dataset issue. When tried the one from our tests it gives 100 %, but we have 10 images there only.
@RSMNYS I don't get it.
This is the original Mobielnet EdgeTPU model we had or a new V4 one? As far as I can remember, we checked that we can have expected accuracy number for the original one.
Please check
- run all the benchmark items with TFLite + CoreML Delegate backend as the baseline (optional)
- Mobilenet EdgeTPU's accuracy numbers (including both non-offline and offline ones)
- accuracy numbers of other models: as far as I can remember, all models except the MobileBERT should have good enought accuracy.
FYR, on an iPhone 13, for Mobilenet EdgeTPU I got 76.21% running binary built from lastest master branch.
Thanks, all works. For iPhone 14 Pro, I have the same 76.21%. Will try with the ImageNet V2 and different optimised models based on it.
All tests were done on iPhone 14 Pro
| Model Name | Performance (QPS) | Accuracy (%) | Size, Mb |
|---|---|---|---|
| MobilenetV4_Large.mlmodel | 268.58 | 81.82% | 130.1 |
| MobilenetV4_Large.mlpackage | 251.36 | 82.73% | 65.5 |
| MobilenetV4_Large.mlpackage (8 bit quantization) | 299.25 | 82.7% | 33.3 |
| MobilenetV4_Large.mlpackage (20% sparsity) | 258.39 | 82.26% | 56.6 |
| MobilenetV4_Large.mlpackage (30% sparsity) | 244.7 | 80.83% | 50.1 |
| MobilenetV4_Large.mlpackage (40% sparsity) | 261.22 | 74.15% | 43.6 |
| MobilenetV4_Large.mlpackage (30% sparsity, 8 bit quantization) | 299.4 | 80.83% | 50.1 |
Also during the test I noticed the performance drop when device is warm (after several tests). And sometimes it drops from 300 to 200 qps. Please check also the screenshot, there you can see the tests for MobilenetV4_Large.mlpackage (8 bit quantization) only. You can see how the performance could differ. @freedomtan
Here is the link to models: https://github.com/RSMNYS/mobile_models/tree/main/v4_0/CoreML
@RSMNYS thermal throttling is a well-known issue on cell phone. A typical way to get numbers we want is to cool down the device before you run a new test :-)
please try to do the first 3 items and ensure that there is not thermal throttling. e.g., cold start, wait for 5 mins, and measure the performance numbers.
Note that currently we don't allow model pruning (sparsity above) for submission. If we want to allow that, we need to change our rules.
All tests were done on iPhone 14 Pro
| Model Name | Performance (QPS) | Accuracy (%) | Size, Mb |
|---|---|---|---|
| MobilenetV4_Large.mlmodel | 294.85 | 81.2% | 124 |
| MobilenetV4_Large.mlpackage | 296.93 | 82.73% | 65.5 |
| MobilenetV4_Large.mlpackage (8 bit quantization) | 295.11 | 82.7% | 33.3 |
All tests were done on iPhone 14 Pro
Model Name Performance (QPS) Accuracy (%) Size, Mb MobilenetV4_Large.mlmodel 294.85 81.2% 124 MobilenetV4_Large.mlpackage 296.93 82.73% 65.5 MobilenetV4_Large.mlpackage (8 bit quantization) 295.11 82.7% 33.3
These numbers look reasonable now. But let's see if we can further improve it.
Let's check if @colbybanbury can comment on this.
MobilenetV4 was made public last week, see https://arxiv.org/abs/2404.10518 or https://arxiv.org/html/2404.10518v1 According to numbers in the paper, it should be able to get > 300 qps for iPhone 13.
The V4 paper results use an iPhone 13 and fp16 quantization. The model was also derived from a Pytorch equivalent in order to be in (batch, channel, height, width) tensor format which I measured to be slightly faster.
I recommend using fp16 on iPhones with a version number less than 15 pro where they added int8-int8 compute.
Happy to help if needed!
@RSMNYS From the paper https://arxiv.org/abs/2404.10518
for benchmarks on the Apple Neural Engine (conducted on an iPhone 13 with iOS 16.6.1, CoreMLTools 7.1, and Xcode 15.0.1 for profiling), PyTorch models were converted to CoreML’s MLProgram format in Float16 precision, with float16 MultiArray inputs to minimize input copying
@freedomtan can you point please where we can get the MobileNet V4 PyTorch model. As currently we have only tf lite one.
The PyTorch model has yet to be officially released. Sorry for the delay!
The TensorFlow model should still get similar latency results, but let me know if I can help with anything.
@freedomtan to try it on iPhone 13 again.
@freedomtan to try it on iPhone 13 again.
As I got before, on iPhone 13, it's about 220 qps
Let's try to have PyTorch model (with weights from the TensorFlow model).
@colbybanbury can you please tell us if you use mlmodel or mlpackage CoreML models in your tests?
I used MLPackage
@RSMNYS With Xcode 16.0 beta and iOS 18 + MLPackage targeting iOS 15 or later, it's possible to get per-op time. Please check https://developer.apple.com/videos/play/wwdc2024/10161/?time=927
Per-op profiling actually is possible on iOS 17.4+ / MacOS 14.4+. I wrote a little command line program and tested it on my Macbook Pro M1, see https://github.com/freedomtan/coreml_modelc_profling
FWIW There's still no official weights from the paper authors, but I've trained a number of PyTorch native MobileNetV4 models and made them available in timm. The conv-medium runs quite nicely on CPU w/o much extra optimization.
https://github.com/huggingface/pytorch-image-models?tab=readme-ov-file#june-12-2024
FWIW There's still no official weights from the paper authors, but I've trained a number of PyTorch native MobileNetV4 models and made them available in
timm. The conv-medium runs quite nicely on CPU w/o much extra optimization. https://github.com/huggingface/pytorch-image-models?tab=readme-ov-file#june-12-2024
@rwightman: FYI, thanks to @colbybanbury, one of the co-authors of the paper, we did have MobileNetV4-Conv-Large saved_model, and tflites, see https://github.com/mlcommons/mobile_open/tree/main/vision/mobilenetV4
@RSMNYS pip install git+https://github.com/huggingface/pytorch-image-models.git then
import timm
import torch
import coremltools as ct
torch_model = timm.create_model("hf-hub:timm/mobilenetv4_conv_large.e600_r384_in1k", pretrained=True)
torch_model.eval()
# Trace the model with random data.
example_input = torch.rand(1, 3, 384, 384)
traced_model = torch.jit.trace(torch_model, example_input)
out = traced_model(example_input)
model = ct.convert(
traced_model,
convert_to="mlprogram",
inputs=[ct.TensorType(shape=example_input.shape)]
)
model.save("mobilenetv4.mlpackage")
This model takes around 3.10 ms (> 300 qps) on my iPhone 13. iPhone 14 Pro: 2.29 ms (436)
These matche what @colbybanbury and other said in the paper. Please try to see if we can get the same performance with the TF saved_model.
Thanks @rwightman
@RSMNYS and @anhappdev According to coremltools 8.0b1 doc on quantization, it's possible to create a calibrated quantized A8W8 PTQ model from an existing Core ML model.
I used random data as calibration data. Then I got.
unit: ms
| device | fp32 | quantized a8w8 |
|---|---|---|
| iphone 13 | 3.10 | 2.23 |
| iphone 14 Pro | 2.29 | 1.83 |
| iphone 15 Pro | 2.24 | 1.38 |
Maybe we can use "real" calibration data to check if quantized int8 models could meet accuracy thresholds.
Maybe we can use "real" calibration data to check if quantized int8 models could meet accuracy thresholds.
I will try to do that.
@freedomtan Good to hear that. For quantization, some weights quantize 'better' (less performance drop) than others, the training hparams have an impact. I'd be curious to know how the timm weights I've trained so far fair in that regard.