ONE
ONE copied to clipboard
[luci-micro] Speedup GRU/LSTM operations
What
Need to implement optimized kernels for GRU and LSTM operations in interpreter for MCU
Why
For now these operations are not optimized, works slow and consumes a lot of memory
+cc @SlavikMIPT, @ai-moiseev
@ai-moiseev Could you schedule model provided by @SlavikMIPT using https://github.com/Samsung/ONE/tree/master/compiler/circle-execution-plan and vs-code visualizer?
@SlavikMIPT made GRU model: model_gru.circle.zip
I made model with UndirectionalSequenceLSTM: model_lstm.circle.zip
Current support status
Int8
keras2tflite conversion | GRU | LSTM |
---|---|---|
unroll | #9253 | |
fused operation | @SlavikMIPT: kernel implemented and tested, can not save GRU in circle* |
* tflite and circle schemas do not have separate GRU opcode. ways to support GRU:
- Use Flex operations: tf documentation
- need generate model with flex operation
- Implement separate opcode in circle and make fusing pass
float32
keras2tflite conversion | GRU | LSTM |
---|---|---|
unroll | @BalyshevArtem in progress - #9253 | |
fused operation | @BalyshevArtem in progress - #9263 (+hybrid support) |
Update to this https://github.com/Samsung/ONE/issues/9225#issuecomment-1148900583
I've patched tensorflow (r2.9.0) and got "Flex" operation: gru_cell.tar.zip
This operation could be run using TFLite Flex delegate.
What I did with TF is described in this instruction.
@chunseoklee , PTAL at above GRU model for OneRT.
@binarman IMHO, We need to make sure that our target model will be generated in the way we have done. Have you heard anything about this ?
@chunseoklee , PTAL at above GRU model for OneRT.
If this op is passed as custom op, ONERT can process it by implementing it(not that hard). But, at a glance, this flex operation is not custom op. I will take a look.
@chunseoklee
IMHO, We need to make sure that our target model will be generated in the way we have done. Have you heard anything about this ?
Not yet, maybe in near future I'll get some information related to this topic...
But, at a glance, this flex operation is not custom op
this flex op is a custom op, but it is "special" kind of custom op, that is supported by Flex delegate in TFLite: https://www.tensorflow.org/lite/guide/ops_select