tflite-micro
tflite-micro copied to clipboard
Passing custom/additional data to kernels
Hello, I have the following use case that I would like to cover.
How can I provide an efficient kernel for a layer without breaking the compatibility of the model?
Consider:
-
a layer of a NN model (e.g. a linear layer) that could be accelerated by a dedicated hardware module. E.g. imagine you have M cores and you break the linear layers in M chunks.
-
the custom hardware module could benefit from additional information (e.g. how many elements each core could use).
Here follows some ideas:
-
use custom operator
- for each layer that you want to accelerate create a custom operator
- modify the NN model replacing the original layer with a custom operator that is functionally equivalent to the first one
- retrain the NN model
- convert the model to a TensorFlow Lite Model adding the additional custom operator (and possibly passing additional info to the new custom layer using
custom_options
field of Operator of flatbuffer) - provide a implementation of the operator to the interpreter that reads the
custom_options
field and execute the layer accordingly
-
keep the same operator and use a custom conversion tools that adds information on the
custom_options
field. In this way I can avoid the first three steps of the previous list and do the following- convert the model to a TensorFlow Lite Model adding the additional custom operator and passing additional info to the new custom layer using
custom_options
field of Operator of flatbuffer. e.g. like this
- convert the model to a TensorFlow Lite Model adding the additional custom operator and passing additional info to the new custom layer using
import np
import tensorflow as tf
from tensorflow.lite.python import schema_py_generated as schema_fb
def load_model(save_path: str):
with open(save_path, "rb") as f:
return f.read()
tflite_quantized = load_model("models/mlp_int8.tflite")
aModel = schema_fb.ModelT.InitFromPackedBuf(tflite_quantized, 0)
def BuiltinCodeToName(code):
"""Converts a builtin op code enum to a readable name."""
for name, value in schema_fb.BuiltinOperator.__dict__.items():
if value == code:
return name
return None
for i, op in enumerate(aModel.subgraphs[0].operators):
op_code = aModel.operatorCodes[op.opcodeIndex].builtinCode
print(f"[{i}] : {BuiltinCodeToName(op_code)} ({op_code})")
### FROM HERE
custo = np.ones(10,dtype=np.uint8)
aModel.subgraphs[0].operators[4].customOptions = custo
### TO HERE
from tflite_support import flatbuffers
b = flatbuffers.Builder(0)
b.Finish(
aModel.Pack(b)
)
model_buff = b.Output()
def save_tflite_model(tflite_model, save_dir, model_name):
"""save the converted tflite model
Args:
tflite_model (binary): the converted model in serialized format.
save_dir (str): the save directory
model_name (str): model name to be saved
"""
import os
if not os.path.exists(save_dir):
os.makedirs(save_dir)
save_path = os.path.join(save_dir, model_name)
with open(save_path, "wb") as f:
f.write(tflite_model)
save_tflite_model(model_buff, "MLP_models", "mlp_int8.tflite" )
- I will have to modify the behaviour of microinterpreter to allow for builtin operators to have custom_options
- provide a suitable implementation of the operator that uses the custom_options of the operator to do something smart
with the second approach I see the following advantages:
- I do not have to modify the model. I can just write a python script that could look at the model and "annotate the layers with custom_options" when needed.
- I have compatibility with the original model and can switch between accelerated and non accelerated kernels (e.g. in certain cases due to the fixed costs needed to start the dedicated hardware module the reference implementation or another operator implementation is better suited)
- I lower the complexity of accelerating operators
- No need to retrain the model from scratch