ONE [luci-micro] Speedup GRU/LSTM operations

What

Need to implement optimized kernels for GRU and LSTM operations in interpreter for MCU

Why

For now these operations are not optimized, works slow and consumes a lot of memory

Jun 03 '22 12:06 binarman

+cc @SlavikMIPT, @ai-moiseev

Jun 03 '22 12:06 binarman

@ai-moiseev Could you schedule model provided by @SlavikMIPT using https://github.com/Samsung/ONE/tree/master/compiler/circle-execution-plan and vs-code visualizer?

Jun 03 '22 12:06 binarman

@SlavikMIPT made GRU model: model_gru.circle.zip

I made model with UndirectionalSequenceLSTM: model_lstm.circle.zip

Jun 06 '22 14:06 BalyshevArtem

my models with related python scripts: rnn_examples.zip

based on this colab from this tutorial

Jun 07 '22 16:06 binarman

Current support status

Int8

keras2tflite conversion	GRU	LSTM
unroll	#9253
fused operation	@SlavikMIPT: kernel implemented and tested, can not save GRU in circle*

* tflite and circle schemas do not have separate GRU opcode. ways to support GRU:

Use Flex operations: tf documentation

need generate model with flex operation

Implement separate opcode in circle and make fusing pass

float32

keras2tflite conversion	GRU	LSTM
unroll	@BalyshevArtem in progress - #9253
fused operation		@BalyshevArtem in progress - #9263 (+hybrid support)

Jun 07 '22 16:06 binarman

Update to this https://github.com/Samsung/ONE/issues/9225#issuecomment-1148900583

I've patched tensorflow (r2.9.0) and got "Flex" operation: gru_cell.tar.zip

This operation could be run using TFLite Flex delegate.

What I did with TF is described in this instruction.

Jun 09 '22 18:06 binarman

@chunseoklee , PTAL at above GRU model for OneRT.

Jun 09 '22 21:06 seanshpark

@binarman IMHO, We need to make sure that our target model will be generated in the way we have done. Have you heard anything about this ?

Jun 10 '22 02:06 chunseoklee

@chunseoklee , PTAL at above GRU model for OneRT.

If this op is passed as custom op, ONERT can process it by implementing it(not that hard). But, at a glance, this flex operation is not custom op. I will take a look.

Jun 10 '22 02:06 chunseoklee

@chunseoklee

IMHO, We need to make sure that our target model will be generated in the way we have done. Have you heard anything about this ?

Not yet, maybe in near future I'll get some information related to this topic...

Jun 15 '22 13:06 binarman

But, at a glance, this flex operation is not custom op

this flex op is a custom op, but it is "special" kind of custom op, that is supported by Flex delegate in TFLite: https://www.tensorflow.org/lite/guide/ops_select

Jun 15 '22 13:06 binarman

ONE ONE copied to clipboard

[luci-micro] Speedup GRU/LSTM operations

What

Why

Current support status

ONE
ONE copied to clipboard