ONE
ONE copied to clipboard
Let's find reference model for RNN support
For milestone in https://github.com/Samsung/ONE/projects/9#card-79474017
Candidate 1
one-cmds pythorch (or ONNX) LSTM op import fails · Issue #8217
- model link : https://github.com/Samsung/ONE/files/7860779/LSTM.zip
Candidate 2
based on https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/experimental_new_converter/Keras_LSTM_fusion_Codelab.ipynb
We can generate other RNN model like SimpleRNN, LSTM, and GRU. Here is a example code to generate with GRU :
# !pip install tensorflow==2.7.0
import numpy as np
import tensorflow as tf
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(28, 28), name='input'),
tf.keras.layers.GRU(20, time_major=False, return_sequences=True),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='output')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.summary()
run_model = tf.function(lambda x: model(x))
# This is important, let's fix the input size.
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))
# model directory.
MODEL_DIR = "keras_lstm"
model.save(MODEL_DIR, save_format="tf", signatures=concrete_func)
converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_DIR)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
- model link with GRU : gru.zip
- model link with LSTM : https://github.com/Samsung/ONE/files/8377917/model_LSTM_keras.zip
Candidate 3
based on pytorch tutorial :
- https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
- https://pytorch.org/tutorials/beginner/chatbot_tutorial.html
Considering the objective in https://github.com/Samsung/ONE/projects/9( "RNN Model with single while loop of non-dynamic tensor" ), while loop is required.
With above pytorch tutorials I could prepare a simple encoder model with below script
import torch
import torch.onnx
import onnx
torch.manual_seed(1)
class SimpleEncoder(torch.nn.Module):
def __init__(self, hidden_size, n_layers=1):
super(SimpleEncoder, self).__init__()
self.n_layers = n_layers
self.hidden_size = hidden_size
self.lstm = torch.nn.LSTM(hidden_size, hidden_size, n_layers)
def forward(self, input_seq, input_lengths, hidden=None):
outputs, hidden = self.lstm(input_seq, hidden)
return outputs, hidden
n_layers = 1
hidden_size = 16
encoder = SimpleEncoder(hidden_size, n_layers);
inputs = torch.randn(n_layers, 2, hidden_size)
print("inputs =", inputs)
h0 = torch.randn(n_layers, 2, hidden_size)
c0 = torch.randn(n_layers, 2, hidden_size)
outputs, (hn, cn) = encoder(inputs, 1, (h0, c0))
print("outputs =", outputs)
print("hn =", hn)
print("cn =", cn)
input_names = ["input", "h0", "c0"]
output_names = ["output", "hn", "cn"]
torch.onnx.export(encoder,
(inputs, (h0, c0)),
"simple_encoder_01.onnx",
input_names=input_names,
output_names=output_names)
def save_with_shape(fname, fnamewsi):
model = onnx.load(fname)
mode_si = onnx.shape_inference.infer_shapes(model)
onnx.save(mode_si, fnamewsi)
save_with_shape("simple_encoder_01.onnx", "simple_encoder_01_si.onnx")
In Candidate 2, by replacing GRU with LSTM, we will get a model without WHILE and with UnidirectionalLSTM, which is an operation on TFLite. model_LSTM_keras.zip
cc @ragmani
I tried to obtain full(w8a8) quantized model as the follows, but fail to get full quantized one.
# !pip install tensorflow==2.7.0
import numpy as np
import tensorflow as tf
def representative_dataset():
for _ in range(100):
data = np.random.rand(1, 28, 28)
yield [data.astype(np.float32)]
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(28, 28), name='input'),
tf.keras.layers.GRU(20, time_major=False, return_sequences=True),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='output')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.summary()
run_model = tf.function(lambda x: model(x))
# This is important, let's fix the input size.
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))
# model directory.
MODEL_DIR = "keras_lstm"
model.save(MODEL_DIR, save_format="tf", signatures=concrete_func)
converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_DIR)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()
with open('model_q8.tflite', 'wb') as f:
f.write(tflite_model)
Here is a netron snapshot:
which contains "dequantize" and "quantize" around While operation. It seems that TFLiteConverter does not support quantization for While op.
I tried to quantize body subgraph but failed.
- Error message
2022-09-22 21:50:07.264869: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/onecc-0.1.0+220921195027-py3.8.egg/onecc/cli/onecc.py", line 40, in invoke
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/onecc', 'optimize', '--input_path', '/tmp/onecc_afwww81g/model_body.0.import.circle', '--output_path', '/tmp/onecc_afwww81g/model_body.0.import.0.opt.circle', '--fuse_add_with_tconv', '--fuse_add_with_fully_connected', '--fuse_batchnorm_with_conv', '--fuse_batchnorm_with_tconv', '--fuse_batchnorm_with_dwconv', '--fuse_activation_function', '--fuse_instnorm', '--fold_dequantize', '--fold_densify', '--substitute_padv2_to_pad', '--substitute_splitv_to_split', '--substitute_squeeze_to_reshape', '--resolve_customop_add', '--resolve_customop_batchmatmul', '--resolve_customop_max_pool_with_argmax', '--resolve_customop_splitv', '--transform_min_max_to_relu6', '--transform_min_relu_to_relu6', '--replace_non_const_fc_with_batch_matmul']' returned non-zero exit status 255.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "quantize.py", line 33, in <module>
optimized_circle = onecc.optimize(circle, options=optimize_options)
File "/usr/local/lib/python3.8/dist-packages/onecc-0.1.0+220921195027-py3.8.egg/onecc/commands/optimize/__init__.py", line 44, in optimize
File "/usr/local/lib/python3.8/dist-packages/onecc-0.1.0+220921195027-py3.8.egg/onecc/cli/onecc.py", line 64, in invoke
onecc.errors.CommandError: Error while running command:
$ /usr/bin/onecc optimize --input_path /tmp/onecc_afwww81g/model_body.0.import.circle --output_path /tmp/onecc_afwww81g/model_body.0.import.0.opt.circle --fuse_add_with_tconv --fuse_add_with_fully_connected --fuse_batchnorm_with_conv --fuse_batchnorm_with_tconv --fuse_batchnorm_with_dwconv --fuse_activation_function --fuse_instnorm --fold_dequantize --fold_densify --substitute_padv2_to_pad --substitute_splitv_to_split --substitute_squeeze_to_reshape --resolve_customop_add --resolve_customop_batchmatmul --resolve_customop_max_pool_with_argmax --resolve_customop_splitv --transform_min_max_to_relu6 --transform_min_relu_to_relu6 --replace_non_const_fc_with_batch_matmul
[EXIT CODE]
255
[STDOUT]
[STDERR]
circle2circle: ERROR: loco::must_cast() failed to cast: PN4luci11CircleConstE
Try re-running the command from the command line.
If you see the same error message from the command line,
You are ready report an issue to: https://github.com/Samsung/ONE/issues.
When reporting an issue, please make sure you attach the below information.
1. Installed one-compiler version (can be found with `dpkg-query -s one-compiler`)
2. Full command and the necessary files to reproduce the error
Here is scripts and the body subg model to reproduce.
- Create a tflite model with a while op
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
# Load a dataset
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
# Build a training pipeline
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)
# Build an evaluation pipeline
ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(28, 28), name='input'),
tf.keras.layers.GRU(20, time_major=False, return_sequences=True),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='output')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
model.fit(
ds_train,
epochs=1,
validation_data=ds_test,
)
model.summary()
run_model = tf.function(lambda x: model(x))
# This is important, let's fix the input size.
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
# NOTE Do not set converter.optimizations. It converts weights in the model to quantized int8. "onecc" throws errors when quantizing models because "onecc" does not support to quantize models that have quantized weights.
tflite_model = converter.convert()
tflite_path='model.tflite'
with open(tflite_path, 'wb') as f:
f.write(tflite_model)
- Cut only the body graph
$ echo "0-25" > opcode.txt
$ python3 tools/tflitefile_tool/select_operator.py -g 2 model.tflite opcode.txt model_body.tflite
- Quantize the body graph
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
# Load a dataset
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
train_images = [image for image, label in ds_train]
import onecc
import onecc.experimental.auto
quantized_circle_path = 'model_body.q8.circle'
body_tflite_path = 'model_body.tflite'
dtype = 'uint8'
# Get default options (experimental feature)
import_options = onecc.experimental.auto.get_import_options(model='tflite', backend='tv2')
optimize_options = onecc.experimental.auto.get_optimize_options(model='tflite', backend='tv2')
quantize_options = onecc.experimental.auto.get_quantize_options(model='tflite', backend='tv2')
# Prepare representative dataset for quantization
# TODO get random sample
representative_dataset = [ (np.array(i).astype(np.int32), np.array(i).astype(np.int32), np.random.rand(1, 20).astype(np.float32), train_images[i].numpy().reshape(28,1,28).astype(np.float32)) for i in range(5) ]
# Import, optimize, and quantize the model
circle = onecc.import_tflite(body_tflite_path, options=import_options)
optimized_circle = onecc.optimize(circle, options=optimize_options)
quantized_circle = onecc.quantize(optimized_circle,
dataset=representative_dataset,
quantized_dtype=dtype,
options=quantize_options)
# Save the generated model
quantized_circle.save(quantized_circle_path)
circle2circle: ERROR: loco::must_cast() failed to cast: PN4luci11CircleConstE
@ragmani , please share input .circle
file that was used for /usr/bin/onecc optimize
Here is the input .circle
file
model_body.0.import.zip
For testing, using model_body.cfg
one-optimize -C model_body.cfg
[one-optimize]
input_path=model_body.0.import.circle
output_path=model_body.0.import.0.opt.circle
fuse_add_with_tconv=True
fuse_add_with_fully_connected=True
fuse_batchnorm_with_conv=True
fuse_batchnorm_with_tconv=True
fuse_batchnorm_with_dwconv=True
fuse_activation_function=True
fuse_instnorm=True
fold_dequantize=True
fold_densify=True
substitute_padv2_to_pad=True
substitute_splitv_to_split=True
substitute_squeeze_to_reshape=True
resolve_customop_add=True
resolve_customop_batchmatmul=True
resolve_customop_max_pool_with_argmax=True
resolve_customop_splitv=True
transform_min_max_to_relu6=True
transform_min_relu_to_relu6=True
replace_non_const_fc_with_batch_matmul=True
The model seems to have dynamic tensors that are outputs of Slice
op.
I tried to quantize the body model after removing dynamic tensors.
- Error messege
$ /usr/bin/onecc quantize --input_path /tmp/onecc_vyrt94p6/model_body.0.import.0.opt.circle --output_path /tmp/onecc_vyrt94p6/model_body.0.import.0.opt.0.q.circle --granularity channel --quantized_dtype uint8 --input_data /tmp/onecc_vyrt94p6/dataset.0.h5
[EXIT CODE]
255
[STDOUT]
[STDERR]
/usr/share/one/bin/record-minmax: ERROR: Wrong number of inputs.
- Cut model
$ echo "0-18 20-21 23 25" > opcode.txt
$ python3 tools/tflitefile_tool/select_operator.py -g 2 model.tflite opcode.txt model_body.tflite
- Input model model_body.0.import.0.opt.zip
@ragmani https://github.com/Samsung/ONE/files/9630977/model_body.0.import.0.opt.zip consists of two graphs.
circle2circle: ERROR: loco::must_cast() failed to cast: PN4luci11CircleConstE
direct reason: loco::NodeShape infer_slice(const luci::CircleSlice *node)
fails
-
auto const_size = loco::must_cast<luci::CircleConst *>(node->size());
Slice
input is Concat
which is not Const
as currently we only support Const
Slice input is Concat which is not Const as currently we only support Const
Thanks for your kind response.
If the Slice
input is not const, Slice
op produces a dynamic output. So, in this issue, it would be better to proceed by quantizing the model with Slice
ops removed such as https://github.com/Samsung/ONE/issues/8747#issuecomment-1255822642
I tried to quantize the body model after removing dynamic tensors.
- Error messege ... /usr/share/one/bin/record-minmax: ERROR: Wrong number of inputs.
It's my mistake. I tried to quantize the model with wrong representative inputs.
onecc quantize
--input_path model_body.0.import.0.opt.circle
--output_path model_body.0.import.0.opt.0.q.circle
--granularity channel --quantized_dtype uint8
this gave me
Recording 0'th data Recording 1'th data Recording 2'th data Recording finished. Number of recorded data: 3 circle_quantizer: ERROR: Wrong data type detected in while/add_5
I tried to proceed to quantize the model but I got another error. error_wrong_data_type_detected_in_while-add_5.zip
-
The error node
-
Types of inputs of the model
/usr/bin/onecc quantize --input_path model_body.0.import.0.opt.circle --output_path model_body.0.import.0.opt.0.q.circle --granularity channel --quantized_dtype uint8 --input_data dataset.0.h5
Recording 0'th data
Recording 1'th data
Recording finished. Number of recorded data: 2
circle_quantizer: ERROR: Wrong data type detected in while/add_5
while/add_5
is int32 type...
ping @jinevening
@jinevening Please take a look at https://github.com/Samsung/ONE/issues/8747#issuecomment-1255986000
Ah, sorry. I missed the comment. I'm working on supporting int32 operators in quantizer.
Please note that int32 operators will not be quantized, but left as-is. So backend will receive int32 operators.
https://github.com/Samsung/ONE/pull/9805 will resolve the problem.
@jinevening Thanks for your help. I checked it works well.
I compiled the model, but almost half of body graph was cut by removing the part that couldn't be compiled for running on trix
backend.
I'll try to test the compiled model with trix
backend.
This is the model in circle version. gru_body_model.zip
Scripts
- Create a tflite model with a while op
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
# Load a dataset
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
# Build a training pipeline
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)
# Build an evaluation pipeline
ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(28, 28), name='input'),
tf.keras.layers.GRU(20, time_major=False, return_sequences=True),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='output')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
model.fit(
ds_train,
epochs=1,
validation_data=ds_test,
)
model.summary()
run_model = tf.function(lambda x: model(x))
# This is important, let's fix the input size.
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
# NOTE Do not set converter.optimizations. It converts weights in the model to quantized int8. "onecc" throws errors when quantizing models because "onecc" does not support to quantize models that have quantized weights.
tflite_model = converter.convert()
tflite_path='model.tflite'
with open(tflite_path, 'wb') as f:
f.write(tflite_model)
- Cut only the body graph
$ echo "1-2 4-16 23" > opcode.txt
$ python3 tools/tflitefile_tool/select_operator.py -g 2 model.tflite opcode.txt model_body.tflite
- Quantize the body graph
import numpy as np
import tensorflow as tf
'''
import tensorflow_datasets as tfds
# Load a dataset
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
train_images = [image for image, label in ds_train]
'''
import onecc
import onecc.experimental.auto
quantized_circle_path = 'model_body.q8.circle'
body_tflite_path = 'model_body.tflite'
dtype = 'uint8'
# Get default options (experimental feature)
import_options = onecc.experimental.auto.get_import_options(model='tflite', backend='tv2')
optimize_options = onecc.experimental.auto.get_optimize_options(model='tflite', backend='tv2')
quantize_options = onecc.experimental.auto.get_quantize_options(model='tflite', backend='tv2')
# Prepare representative dataset for quantization
# TODO get random sample
#representative_dataset = [ ( np.array(i).astype(np.int32), np.random.rand(1, 20).astype(np.float32), train_images[i].numpy().reshape(28,1,28).astype(np.float32) ) for i in range(5) ]
representative_dataset = [ ( np.random.rand(1, 20).astype(np.float32) * 255, np.random.rand(1, 28).astype(np.float32) * 255 ) for i in range(5) ]
# Import, optimize, and quantize the model
circle = onecc.import_tflite(body_tflite_path, options=import_options)
optimized_circle = onecc.optimize(circle, options=optimize_options)
quantized_circle = onecc.quantize(optimized_circle,
dataset=representative_dataset,
quantized_dtype=dtype,
options=quantize_options)
# Save the generated model
quantized_circle.save(quantized_circle_path)
I've heard from @ejjeong that we can consider using the model below. https://github.sec.samsung.net/AIP/NPU_Compiler/blob/8b4825a9a83826b79ec75ece8fc40ff1716b7ff3/res/Collab/Issue/13310/caption_image.ptmex#L45
It is a model that has already been proven to run after unrolling. However there are two issues with running the model on onert.
- Is there any way to convert rnn onnx model to circle model without unrolling?
- Is there any way to cut rnn circle model?
I made a tvn file of the model in https://github.com/Samsung/ONE/issues/8747#issuecomment-1260829755 and tried to run it manually. It works well.
$ BACKENDS=trix /usr/bin/nnfw-test/Product/out/bin/nnpackage_run model_body.q8 --load:raw model_body.q8/input_0.tv2b --dump:raw output.tv2b -w 10 -r 100
Package Filename model_body.q8
output.tv2b.0 is generated.
===================================
MODEL_LOAD takes 1.741 ms
PREPARE takes 10.608 ms
EXECUTE takes 1.262 ms
- MEAN : 1.262 ms
- MAX : 5.274 ms
- MIN : 0.782 ms
- GEOMEAN : 1.134 ms
===================================