ONE
ONE copied to clipboard
[onert] User Scenario of On-Device Compiler on ONERT
Let's consider user scenario with on-device compiler on onert.
Q. What is the functionality that ONE's users(developer) expect from nnfw api(including config file) ?
- (allow users to) enable/disable On-Device Compilation(ODC) in app
- (allow users to) choose the time when ODC will be triggered
- choose the way of ODC
- partition infomation(?)
- Successful Workflow
flowchart TD
subgraph "nnpkg"
n1(F32 circle)
end
subgraph "nnpkg2"
n2(Quantized circle)
end
subgraph "nnpkg3"
n3(tvn)
end
nnpkg-->|1. collect rep. data|nnpkg-->| 2. ondevice quantization|nnpkg2 -->|3. ondevice compilation|nnpkg3
- what if ?
- accuracy drop
- long compilation time
- unable to compile due to memory consumption or unsupported op ... etc.
Conclusion on offline discussion
- Let's minimize user level API
- enable ODC via config file. Thus, disable ODC w/o this flag.
Here is a user scenario plus ONE's internal workflow in detail with quantized circle :
-
Assumption A1. model.q8.circle is compilable circle with (trix-compatible) q8 quantized
-
User's app implement app with nnfw api as usual. But nnpackage for this app is like :
nnpackage
├── metadata
│ ├── MANIFEST
│ └── config.cfg
└── model.circle
config.cfg
...
OnDeviceCompilation 1
...
-
Compile model.circle into model.tvn 2-1. compilation is done before first run
-
Reconstruct nnpkg with new tvn binary 3-1. Reconstruction Spec : where to place new tvn binary, how to update MANIFEST ?
Apple is on-device compiling in iOS ? : https://developer.apple.com/documentation/coreml/mlmodel/3931181-compilemodel
Need to investigate more
MLModel.CompileModel(NSUrl, NSError) Method (CoreML) | Microsoft Docs gives more info that Apple :)
After investigation via web docs, IMHO compilation in coreml is like compilation(nnfw_prepare in nnfw.h) in onert, not on-device compilation for npu.
https://github.com/hollance/neural-engine repo shows many details(though they are just guess). According to the repo, Coreml generates execution plan(e.g. which part of mlmodel will be run on gpu/cpu/npu) during compilation. It is similar to backend assignment in ONERT.