ONE Let's find reference model for RNN support

For milestone in https://github.com/Samsung/ONE/projects/9#card-79474017

Candidate 1

one-cmds pythorch (or ONNX) LSTM op import fails · Issue #8217

model link : https://github.com/Samsung/ONE/files/7860779/LSTM.zip

Candidate 2

based on https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/experimental_new_converter/Keras_LSTM_fusion_Codelab.ipynb

We can generate other RNN model like SimpleRNN, LSTM, and GRU. Here is a example code to generate with GRU :

# !pip install tensorflow==2.7.0
import numpy as np
import tensorflow as tf

model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(28, 28), name='input'),
    tf.keras.layers.GRU(20, time_major=False, return_sequences=True),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='output')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.summary()

run_model = tf.function(lambda x: model(x))
# This is important, let's fix the input size.
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
    tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))

# model directory.
MODEL_DIR = "keras_lstm"
model.save(MODEL_DIR, save_format="tf", signatures=concrete_func)

converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_DIR)
tflite_model = converter.convert()

with open('model.tflite', 'wb') as f:
  f.write(tflite_model)

model link with GRU : gru.zip
model link with LSTM : https://github.com/Samsung/ONE/files/8377917/model_LSTM_keras.zip

Candidate 3

based on pytorch tutorial :

https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
https://pytorch.org/tutorials/beginner/chatbot_tutorial.html

Mar 29 '22 06:03 chunseoklee

Considering the objective in https://github.com/Samsung/ONE/projects/9( "RNN Model with single while loop of non-dynamic tensor" ), while loop is required.

Mar 29 '22 06:03 chunseoklee

With above pytorch tutorials I could prepare a simple encoder model with below script

import torch
import torch.onnx
import onnx

torch.manual_seed(1)

class SimpleEncoder(torch.nn.Module):
    def __init__(self, hidden_size, n_layers=1):
        super(SimpleEncoder, self).__init__()
        self.n_layers = n_layers
        self.hidden_size = hidden_size

        self.lstm = torch.nn.LSTM(hidden_size, hidden_size, n_layers)

    def forward(self, input_seq, input_lengths, hidden=None):
        outputs, hidden = self.lstm(input_seq, hidden)
        return outputs, hidden


n_layers = 1
hidden_size = 16
encoder = SimpleEncoder(hidden_size, n_layers);

inputs = torch.randn(n_layers, 2, hidden_size)
print("inputs =", inputs)

h0 = torch.randn(n_layers, 2, hidden_size)
c0 = torch.randn(n_layers, 2, hidden_size)
outputs, (hn, cn) = encoder(inputs, 1, (h0, c0))
print("outputs =", outputs)
print("hn =", hn)
print("cn =", cn)

input_names = ["input", "h0", "c0"]
output_names = ["output", "hn", "cn"]

torch.onnx.export(encoder,
                  (inputs, (h0, c0)),
                  "simple_encoder_01.onnx",
                  input_names=input_names,
                  output_names=output_names)

def save_with_shape(fname, fnamewsi):
    model = onnx.load(fname)
    mode_si = onnx.shape_inference.infer_shapes(model)
    onnx.save(mode_si, fnamewsi)

save_with_shape("simple_encoder_01.onnx", "simple_encoder_01_si.onnx")

Mar 29 '22 06:03 seanshpark

In Candidate 2, by replacing GRU with LSTM, we will get a model without WHILE and with UnidirectionalLSTM, which is an operation on TFLite. model_LSTM_keras.zip

Mar 30 '22 07:03 chunseoklee

cc @ragmani

I tried to obtain full(w8a8) quantized model as the follows, but fail to get full quantized one.

# !pip install tensorflow==2.7.0
import numpy as np
import tensorflow as tf

def representative_dataset():
    for _ in range(100):
      data = np.random.rand(1, 28, 28)
      yield [data.astype(np.float32)]

model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(28, 28), name='input'),
    tf.keras.layers.GRU(20, time_major=False, return_sequences=True),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='output')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.summary()

run_model = tf.function(lambda x: model(x))
# This is important, let's fix the input size.
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
    tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))

# model directory.
MODEL_DIR = "keras_lstm"
model.save(MODEL_DIR, save_format="tf", signatures=concrete_func)

converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_DIR)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()

with open('model_q8.tflite', 'wb') as f:
  f.write(tflite_model)

Here is a netron snapshot:

which contains "dequantize" and "quantize" around While operation. It seems that TFLiteConverter does not support quantization for While op.

Sep 21 '22 06:09 chunseoklee

I tried to quantize body subgraph but failed.

Error message

2022-09-22 21:50:07.264869: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/onecc-0.1.0+220921195027-py3.8.egg/onecc/cli/onecc.py", line 40, in invoke
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/onecc', 'optimize', '--input_path', '/tmp/onecc_afwww81g/model_body.0.import.circle', '--output_path', '/tmp/onecc_afwww81g/model_body.0.import.0.opt.circle', '--fuse_add_with_tconv', '--fuse_add_with_fully_connected', '--fuse_batchnorm_with_conv', '--fuse_batchnorm_with_tconv', '--fuse_batchnorm_with_dwconv', '--fuse_activation_function', '--fuse_instnorm', '--fold_dequantize', '--fold_densify', '--substitute_padv2_to_pad', '--substitute_splitv_to_split', '--substitute_squeeze_to_reshape', '--resolve_customop_add', '--resolve_customop_batchmatmul', '--resolve_customop_max_pool_with_argmax', '--resolve_customop_splitv', '--transform_min_max_to_relu6', '--transform_min_relu_to_relu6', '--replace_non_const_fc_with_batch_matmul']' returned non-zero exit status 255.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "quantize.py", line 33, in <module>
    optimized_circle = onecc.optimize(circle, options=optimize_options)
  File "/usr/local/lib/python3.8/dist-packages/onecc-0.1.0+220921195027-py3.8.egg/onecc/commands/optimize/__init__.py", line 44, in optimize
  File "/usr/local/lib/python3.8/dist-packages/onecc-0.1.0+220921195027-py3.8.egg/onecc/cli/onecc.py", line 64, in invoke
onecc.errors.CommandError: Error while running command:

  $ /usr/bin/onecc optimize --input_path /tmp/onecc_afwww81g/model_body.0.import.circle --output_path /tmp/onecc_afwww81g/model_body.0.import.0.opt.circle --fuse_add_with_tconv --fuse_add_with_fully_connected --fuse_batchnorm_with_conv --fuse_batchnorm_with_tconv --fuse_batchnorm_with_dwconv --fuse_activation_function --fuse_instnorm --fold_dequantize --fold_densify --substitute_padv2_to_pad --substitute_splitv_to_split --substitute_squeeze_to_reshape --resolve_customop_add --resolve_customop_batchmatmul --resolve_customop_max_pool_with_argmax --resolve_customop_splitv --transform_min_max_to_relu6 --transform_min_relu_to_relu6 --replace_non_const_fc_with_batch_matmul

[EXIT CODE]
255
[STDOUT]
[STDERR]
circle2circle: ERROR: loco::must_cast() failed to cast: PN4luci11CircleConstE

Try re-running the command from the command line.

If you see the same error message from the command line,
You are ready report an issue to: https://github.com/Samsung/ONE/issues.

When reporting an issue, please make sure you attach the below information.
  1. Installed one-compiler version (can be found with `dpkg-query -s one-compiler`)
  2. Full command and the necessary files to reproduce the error

Here is scripts and the body subg model to reproduce.

Create a tflite model with a while op

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

# Load a dataset
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

# Build a training pipeline
def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

# Build an evaluation pipeline
ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)


model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(28, 28), name='input'),
    tf.keras.layers.GRU(20, time_major=False, return_sequences=True),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='output')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

model.fit(
    ds_train,
    epochs=1,
    validation_data=ds_test,
)

model.summary()

run_model = tf.function(lambda x: model(x))
# This is important, let's fix the input size.
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
    tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))

converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
# NOTE Do not set converter.optimizations. It converts weights in the model to quantized int8. "onecc" throws errors when quantizing models because "onecc" does not support to quantize models that have quantized weights.
tflite_model = converter.convert()

tflite_path='model.tflite'
with open(tflite_path, 'wb') as f:
  f.write(tflite_model)

Cut only the body graph

$ echo "0-25" > opcode.txt
$ python3 tools/tflitefile_tool/select_operator.py -g 2 model.tflite opcode.txt model_body.tflite

Quantize the body graph

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

# Load a dataset
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)
train_images = [image for image, label in ds_train]


import onecc
import onecc.experimental.auto

quantized_circle_path = 'model_body.q8.circle'
body_tflite_path = 'model_body.tflite'
dtype = 'uint8'

# Get default options (experimental feature)
import_options = onecc.experimental.auto.get_import_options(model='tflite', backend='tv2')
optimize_options = onecc.experimental.auto.get_optimize_options(model='tflite', backend='tv2')
quantize_options = onecc.experimental.auto.get_quantize_options(model='tflite', backend='tv2')

# Prepare representative dataset for quantization
# TODO get random sample
representative_dataset = [ (np.array(i).astype(np.int32), np.array(i).astype(np.int32), np.random.rand(1, 20).astype(np.float32), train_images[i].numpy().reshape(28,1,28).astype(np.float32)) for i in range(5) ]

# Import, optimize, and quantize the model
circle = onecc.import_tflite(body_tflite_path, options=import_options)
optimized_circle = onecc.optimize(circle, options=optimize_options)
quantized_circle = onecc.quantize(optimized_circle,
                                  dataset=representative_dataset,
                                  quantized_dtype=dtype,
                                  options=quantize_options)

# Save the generated model
quantized_circle.save(quantized_circle_path)

model_body.zip

Sep 22 '22 12:09 ragmani

circle2circle: ERROR: loco::must_cast() failed to cast: PN4luci11CircleConstE

@ragmani , please share input .circle file that was used for /usr/bin/onecc optimize

Sep 22 '22 21:09 seanshpark

Here is the input .circle file model_body.0.import.zip

Sep 23 '22 01:09 ragmani

For testing, using model_body.cfg

one-optimize -C model_body.cfg

[one-optimize]
input_path=model_body.0.import.circle
output_path=model_body.0.import.0.opt.circle
fuse_add_with_tconv=True
fuse_add_with_fully_connected=True
fuse_batchnorm_with_conv=True
fuse_batchnorm_with_tconv=True
fuse_batchnorm_with_dwconv=True
fuse_activation_function=True
fuse_instnorm=True
fold_dequantize=True
fold_densify=True
substitute_padv2_to_pad=True
substitute_splitv_to_split=True
substitute_squeeze_to_reshape=True
resolve_customop_add=True
resolve_customop_batchmatmul=True
resolve_customop_max_pool_with_argmax=True
resolve_customop_splitv=True
transform_min_max_to_relu6=True
transform_min_relu_to_relu6=True
replace_non_const_fc_with_batch_matmul=True

Sep 23 '22 01:09 seanshpark

The model seems to have dynamic tensors that are outputs of Slice op.

Sep 23 '22 05:09 ragmani

I tried to quantize the body model after removing dynamic tensors.

Error messege

$ /usr/bin/onecc quantize --input_path /tmp/onecc_vyrt94p6/model_body.0.import.0.opt.circle --output_path /tmp/onecc_vyrt94p6/model_body.0.import.0.opt.0.q.circle --granularity channel --quantized_dtype uint8 --input_data /tmp/onecc_vyrt94p6/dataset.0.h5

[EXIT CODE]
255
[STDOUT]
[STDERR]
/usr/share/one/bin/record-minmax: ERROR: Wrong number of inputs.

Cut model

$ echo "0-18 20-21 23 25" > opcode.txt
$ python3 tools/tflitefile_tool/select_operator.py -g 2 model.tflite opcode.txt model_body.tflite

Input model model_body.0.import.0.opt.zip

Sep 23 '22 05:09 ragmani

@ragmani https://github.com/Samsung/ONE/files/9630977/model_body.0.import.0.opt.zip consists of two graphs.

Sep 23 '22 06:09 chunseoklee

circle2circle: ERROR: loco::must_cast() failed to cast: PN4luci11CircleConstE

direct reason: loco::NodeShape infer_slice(const luci::CircleSlice *node) fails

auto const_size = loco::must_cast<luci::CircleConst *>(node->size());

Slice input is Concat which is not Const as currently we only support Const

Sep 23 '22 08:09 seanshpark

Slice input is Concat which is not Const as currently we only support Const

Thanks for your kind response. If the Slice input is not const, Slice op produces a dynamic output. So, in this issue, it would be better to proceed by quantizing the model with Slice ops removed such as https://github.com/Samsung/ONE/issues/8747#issuecomment-1255822642

Sep 23 '22 09:09 ragmani

I tried to quantize the body model after removing dynamic tensors.

Error messege ... /usr/share/one/bin/record-minmax: ERROR: Wrong number of inputs.

It's my mistake. I tried to quantize the model with wrong representative inputs.

Sep 23 '22 09:09 ragmani

onecc quantize
--input_path model_body.0.import.0.opt.circle
--output_path model_body.0.import.0.opt.0.q.circle
--granularity channel --quantized_dtype uint8

this gave me

Recording 0'th data Recording 1'th data Recording 2'th data Recording finished. Number of recorded data: 3 circle_quantizer: ERROR: Wrong data type detected in while/add_5

Sep 23 '22 09:09 seanshpark

I tried to proceed to quantize the model but I got another error. error_wrong_data_type_detected_in_while-add_5.zip

The error node
Types of inputs of the model

/usr/bin/onecc quantize --input_path model_body.0.import.0.opt.circle --output_path model_body.0.import.0.opt.0.q.circle --granularity channel --quantized_dtype uint8 --input_data dataset.0.h5
Recording 0'th data
Recording 1'th data
Recording finished. Number of recorded data: 2
circle_quantizer: ERROR: Wrong data type detected in while/add_5

Sep 23 '22 09:09 ragmani

while/add_5 is int32 type... ping @jinevening

Sep 23 '22 09:09 seanshpark

@jinevening Please take a look at https://github.com/Samsung/ONE/issues/8747#issuecomment-1255986000

Sep 26 '22 11:09 ragmani

Ah, sorry. I missed the comment. I'm working on supporting int32 operators in quantizer.

Please note that int32 operators will not be quantized, but left as-is. So backend will receive int32 operators.

Sep 27 '22 00:09 jinevening

https://github.com/Samsung/ONE/pull/9805 will resolve the problem.

Sep 27 '22 06:09 jinevening

@jinevening Thanks for your help. I checked it works well.

Sep 28 '22 10:09 ragmani

I compiled the model, but almost half of body graph was cut by removing the part that couldn't be compiled for running on trix backend. I'll try to test the compiled model with trix backend.

This is the model in circle version. gru_body_model.zip

Scripts

Create a tflite model with a while op

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

# Load a dataset
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

# Build a training pipeline
def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

# Build an evaluation pipeline
ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)


model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(28, 28), name='input'),
    tf.keras.layers.GRU(20, time_major=False, return_sequences=True),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='output')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

model.fit(
    ds_train,
    epochs=1,
    validation_data=ds_test,
)

model.summary()

run_model = tf.function(lambda x: model(x))
# This is important, let's fix the input size.
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
    tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))

converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
# NOTE Do not set converter.optimizations. It converts weights in the model to quantized int8. "onecc" throws errors when quantizing models because "onecc" does not support to quantize models that have quantized weights.
tflite_model = converter.convert()

tflite_path='model.tflite'
with open(tflite_path, 'wb') as f:
  f.write(tflite_model)

Cut only the body graph

$ echo "1-2 4-16 23" > opcode.txt
$ python3 tools/tflitefile_tool/select_operator.py -g 2 model.tflite opcode.txt model_body.tflite

Quantize the body graph

import numpy as np
import tensorflow as tf
'''
import tensorflow_datasets as tfds

# Load a dataset
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)
train_images = [image for image, label in ds_train]
'''


import onecc
import onecc.experimental.auto

quantized_circle_path = 'model_body.q8.circle'
body_tflite_path = 'model_body.tflite'
dtype = 'uint8'

# Get default options (experimental feature)
import_options = onecc.experimental.auto.get_import_options(model='tflite', backend='tv2')
optimize_options = onecc.experimental.auto.get_optimize_options(model='tflite', backend='tv2')
quantize_options = onecc.experimental.auto.get_quantize_options(model='tflite', backend='tv2')

# Prepare representative dataset for quantization
# TODO get random sample
#representative_dataset = [ ( np.array(i).astype(np.int32), np.random.rand(1, 20).astype(np.float32), train_images[i].numpy().reshape(28,1,28).astype(np.float32) ) for i in range(5) ]
representative_dataset = [ ( np.random.rand(1, 20).astype(np.float32) * 255, np.random.rand(1, 28).astype(np.float32) * 255 ) for i in range(5) ]

# Import, optimize, and quantize the model
circle = onecc.import_tflite(body_tflite_path, options=import_options)
optimized_circle = onecc.optimize(circle, options=optimize_options)
quantized_circle = onecc.quantize(optimized_circle,
                                  dataset=representative_dataset,
                                  quantized_dtype=dtype,
                                  options=quantize_options)

# Save the generated model
quantized_circle.save(quantized_circle_path)

Sep 28 '22 12:09 ragmani

I've heard from @ejjeong that we can consider using the model below. https://github.sec.samsung.net/AIP/NPU_Compiler/blob/8b4825a9a83826b79ec75ece8fc40ff1716b7ff3/res/Collab/Issue/13310/caption_image.ptmex#L45

It is a model that has already been proven to run after unrolling. However there are two issues with running the model on onert.

Is there any way to convert rnn onnx model to circle model without unrolling?
Is there any way to cut rnn circle model?

Sep 28 '22 12:09 ragmani

I made a tvn file of the model in https://github.com/Samsung/ONE/issues/8747#issuecomment-1260829755 and tried to run it manually. It works well.

model_body.q8.zip

$ BACKENDS=trix /usr/bin/nnfw-test/Product/out/bin/nnpackage_run model_body.q8 --load:raw model_body.q8/input_0.tv2b --dump:raw output.tv2b -w 10 -r 100
Package Filename model_body.q8
output.tv2b.0 is generated.
===================================
MODEL_LOAD   takes 1.741 ms
PREPARE      takes 10.608 ms
EXECUTE      takes 1.262 ms
- MEAN     :  1.262 ms
- MAX      :  5.274 ms
- MIN      :  0.782 ms
- GEOMEAN  :  1.134 ms
===================================

Sep 29 '22 10:09 ragmani

ONE ONE copied to clipboard

Let's find reference model for RNN support

Candidate 1

Candidate 2

Candidate 3

Scripts

ONE
ONE copied to clipboard