onnx-tensorflow ONNX->"TensorFlow Lite micro" fundamentally broken because of transpose operations

Problem

In my usecase, I need to port a model (only containing conv, relu and maxpool layers) from PyTorch to TensorFlow Lite micro, using the following conversions: PyTorch->ONNX->TensorFLow->Tensorflow Lite micro. The microcontroller kernels of TFLite however only support NHWC format and there is no support for a transpose layer (see supported layers here). In CPU mode, onnx-tf inserts transpose operations before and after every conv layer so the conv layers can work with NHWC format. However, the rest of the network is kept in NCHW format. What I need is a network with only NHWC tensors for all layers without any transpose operations (also input/output layers should have NHWC format).

It would be nice if onnx-tf had a mode where you can completely convert a model that is only using NCHW format to a model that is only using NHWC format instead of creating hybrids as is done by the current implementation. I know this is currently not supported and there is probably not an easy solution for this. But is want to indicate that a conversion from ONNX->TFLite micro is currently fundamentally broken for any given model because of the transpose ops.

We could also fix the problem at the TFLite micro side, by implementing a transpose op, but that is a more wasteful solution given the unnecessary overhead of transpose operations on an already very constrained platform. Fixing the issue in TensorFlow by rewriting the graph seems a bit too complex to do and error prone to future changes.

I therefore kindly ask to consider implementing an additional mode in the converter if possible. I'm willing to contribute if necessary.

Python, ONNX, ONNX-TF, Tensorflow version

Python version: 3.6.5
ONNX version: 1.8.0
ONNX-TF version: 1.7.0 (master branch, commit: db092105ceebe076610a1b27c1fc1553978c17cd)
Tensorflow version: 2.3.1

Feb 12 '21 09:02 maartenvds

The request is reasonable. However a network with only NHWC tensors for all layers is the complete opposite of current ONNX spec. While the model input/output could be in any format, a number of ONNX operators mandate NCHW, including conv, pooling, BatchNorm, resize, etc. That is the reason onnx-tf has to insert transpose when the target runtime is NHWC.

Adding a new option, indicating NHWC for entire ONNX model and no need to transpose, to onnx-tf is possible technically. The downside is that such a model won't work in any other backends unless similar option is implemented there. Additionally we need the frontend converter, like Pytorch to onnx, to produce NHWC models.

I think this particular issue should be addressed in the ONNX core/spec, adding an optional attribute for NHWC. Therefore it would be clear in the ONNX model which data format is used and easy for all converters to work accordingly.

I will present this desired new feature to the ONNX steering committee next week. It would be wonderful if you can draft a proposal and help implementation. Thanks!

Feb 15 '21 19:02 chinhuang007

Hi @chinhuang007. Thanks for the detailed response. I actually don't mind that ONNX is demanding NCHW. It actually think this is a good thing because it standardizes the exchange format. Allowing too much freedom makes the standard complex and makes converting from/to ONNX complex too, supporting all sorts of input/output combinations.

Suppose you would add an additional option that specifies the tensor format in ONNX and we would export from pytorch the option would indicate NCHW since pytorch uses NCHW internally. We would still need to convert the model when going from ONNX to TFLite micro since it needs NHWC. If we would export the model from pytorch to ONNX and we would specify that the ONNX format needs to match the format of the target (NHWC), the pytorch to ONNX converter needs to implement a conversion from NCHW to NHWC. This would not make it easier in general, it just create a different problem and requires pytorch to update their convertors.

I've made a proof of concept implementation recently for ONNX-TF that works for my scenario. It probably needs more changes and testing to work in general, but it indicates it can be done with minor changes. ONNX-TF has a sys_config.device attribute that currently either specifies 'CUDA' or 'CPU'. I added a third option coined 'MCU' (or you could name it micro). If you specify 'MCU', the model inputs will be converted from NCHW to NHWC and get_data_format will return NHWC for both storage_format and compute_format variables. For convolutions, 'MCU' will act like 'CUDA': it will not insert transpose operations. I also made a small change in the padding layer, but that is basically it. I've implemented the changes in a fork of onnx_tf. You can review them here

Feb 16 '21 09:02 maartenvds

Currently ONNX requires NCHW for certain operators, not all, not input/output. Therefore technically speaking, ONNX doesn't standardize/enforce the data format for the entire model, making conversion complicated. At times the frontend converter has to add transpose because the input tensor is in NHWC. Pytorch is only one of many frontend frameworks. As a standard, I think ONNX should be format neutral since some frameworks are channel-last by default.

Should ONNX have the option as I mentioned, Pytorch could add an NHWC export option. so that the onnx model will be optimized for channel-last backends.

Feb 16 '21 18:02 chinhuang007

The MCU prototype seems to assume the input is always in NCHW. I don't think that is the case. The frontend converters could create a model with input/output in NHWC and insert transpose in the model graph before/after the operators requiring NCHW.

We could certainly add such an option to onnx-tf given the user knows for sure the input tensors are all in NCHW. I am wondering if sys_config.device is the best place for it since the attribute is originally used to indicate the target runtime environment with GPU or not, ie the compute_format. I believe MCU means 1. input format is NCHW for all tensors 2. we use NHWC for storage_format and compute_format. Therefore it feels like another option for storage_format. Or we could expand sys_config.device definition to cover storage_format, such as

CUDA: storage_format=NCHW, compute_format=NCHW CPU: storage_format=NCHW, compute_format=NHWC MCU: storage_format=NHWC, compute_format=NHWC

The other side effect with MCU is the converted model will be different from the original in terms of input shape. The users need to be aware of this change because the same inputs won't work with the converted model.

Feb 16 '21 18:02 chinhuang007

The MCU prototype seems to assume the input is always in NCHW. I don't think that is the case. The frontend converters could create a model with input/output in NHWC and insert transpose in the model graph before/after the operators requiring NCHW.

We could certainly add such an option to onnx-tf given the user knows for sure the input tensors are all in NCHW. I am wondering if sys_config.device is the best place for it since the attribute is originally used to indicate the target runtime environment with GPU or not, ie the compute_format. I believe MCU means 1. input format is NCHW for all tensors 2. we use NHWC for storage_format and compute_format. Therefore it feels like another option for storage_format. Or we could expand sys_config.device definition to cover storage_format, such as

CUDA: storage_format=NCHW, compute_format=NCHW CPU: storage_format=NCHW, compute_format=NHWC MCU: storage_format=NHWC, compute_format=NHWC

The other side effect with MCU is the converted model will be different from the original in terms of input shape. The users need to be aware of this change because the same inputs won't work with the converted model.

Dear @chinhuang007 I think the same graph can both run in CUDA and CPU with tensorflow, why converts different graph with transpose op by sys_config.device ? and transpose op will increase inference time in mobile phone. I‘m happy to solve this problem.

Thanks.

Jun 02 '21 07:06 menchunlei

The sys_config.device is to indicate the target runtime device. Some tensorflow ops, such as https://www.tensorflow.org/api_docs/python/tf/nn/convolution, have data_format argument and process differently based on the format. We cannot leave it as default, channel last, because the data format in ONNX is channel first, otherwise the results won't be correct.

Having said that, I will be glad to learn more about your solution.

Jun 03 '21 17:06 chinhuang007

The current onnx-tf is to convert ONNX to Tensorflow SavedModel, obviously not targeted or optimized for TFLite. So maybe a TFLite optimizer can be developed as a new feature.

Jun 03 '21 17:06 chinhuang007

The sys_config.device is to indicate the target runtime device. Some tensorflow ops, such as https://www.tensorflow.org/api_docs/python/tf/nn/convolution, have data_format argument and process differently based on the format. We cannot leave it as default, channel last, because the data format in ONNX is channel first, otherwise the results won't be correct.

Having said that, I will be glad to learn more about your solution.

Hi @chinhuang007 thanks for your reply. I will have a try to add sys_config.device(Mobile) optimized for TFLite. Refer @maartenvds .

Jun 05 '21 14:06 menchunlei

onnx-tensorflow onnx-tensorflow copied to clipboard

ONNX->"TensorFlow Lite micro" fundamentally broken because of transpose operations

onnx-tensorflow
onnx-tensorflow copied to clipboard