onnx2tf
onnx2tf copied to clipboard
pre_explicit_broadcast should not expend scalar tensor
Issue Type
Feature Request
OS
Linux
onnx2tf version number
1.19.7
onnx version number
1.15.0
onnxruntime version number
1.16.3
onnxsim (onnx_simplifier) version number
0.4.33
tensorflow version number
2.15.0
Download URL for ONNX
Parameter Replacement JSON
none
Description
-
The converted tflite graph cannot run through ArmNN with Arm Compute Library GPU (which is faster than tflite on native CPU.)
-
For ONNX nodes:
Input -> Gemm -> Mul (with scalar)and use batch=1:
The converted tflite graph contains a fully connected layer with a bias of two dimensions while originally the bias is one dimension:
Since the new dimension is 1, mathematically the converted tflite graph is still correct. Unfortunately, Arm Compute Library currently errors out if the bias has more than one dimension: https://github.com/ARM-software/ComputeLibrary/blob/c2a79a4b8c51ce835eaf984f3a1370447b3282c4/src/cpu/operators/CpuFullyConnected.cpp#L431
- It appears that the extra dimension is added at
pre_explicit_broadcasthttps://github.com/PINTO0309/onnx2tf/blob/4e8f29129e6ea03f45f9d6520cc077522a790249/onnx2tf/utils/common_functions.py#L845:L858. Can this function be changed to do nothing if the tensor is a scalar?
It is a pain in the ass to generate and test the ONNX myself for your request. Please provide me with an ONNX file, even a partial model. I know all I have to do is write a few lines of code in PyTorch. But even that is a pain in the ass.
For sure, here is an ONNX file to reproduce: https://mythic.box.com/s/wmwsrpd41vpe27pqv1k3c7z3lb2daxqr
The tool has an option to optimize for GPU Delegate, but I noticed a bug in Gemm's implementation during testing, so I fixed it and released the tool. v1.19.8
https://github.com/PINTO0309/onnx2tf/releases/tag/1.19.8
Gemm-
Fixed Abort bug when converting models with
--optimization_for_gpu_delegateor-ofgd.onnx before tflite after tflite
-
Thanks for the fix, Unfortunately, I don't want to use --optimization_for_gpu_delegate since it actually makes armNN run slower. Is it possible to have a fix of not using --optimization_for_gpu_delegate?
I think you have misunderstood something, pre_explicit_broadcast is irrelevant. The value corresponding to bias (onnx:C) is already processed as a scalar inside onnx2tf. However, when the model is converted to tflite, it generates bias of [1,1000] on its own. The issue at stake in this issue is the C of ONNX, i.e., the bias. NNAPI has a long-standing problem with properly handling bias.
https://github.com/PINTO0309/onnx2tf/blob/6d0c3d828b1f461c11a8516600acbc9698fdbf10/onnx2tf/ops/Gemm.py#L194-L205
# cast to attribute data type
x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
z = tf.cast(z, tf.float32)
if not optimization_for_gpu_delegate:
if z is not None:
result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta) * z #<----- here
else:
result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) + tf.convert_to_tensor(beta)
else:
result = tf.convert_to_tensor(alpha) * tf.matmul(x, y) - (tf.convert_to_tensor(beta) * z) * tf.convert_to_tensor(-1.0, dtype=z.dtype)
It is a debug printout that proves the conversion as a scalar.
beta
1.0
# onnx:C TFLie: z
z.shape
TensorShape([1000])
(tf.convert_to_tensor(beta) * z).shape
TensorShape([1000])
I know from the beginning that GPU Delegate cannot handle bias in more than 2 dimensions. However, there is a bug in tflite converter that does not produce scalar bias. So I implemented a workaround that I don't like: the GPU optimization option.
You should issue an issue to the NNAPI repository or the TensorFlow repository. Btw, I have already posted that discussion on a TensorFlow issue over a year ago, but it has been ignored.
Good luck.
I ran an ONNX model that does not have a Mul node followed by Gemm. The converted tflite graph has FullyConnected node with bias of one dimension as expected though. Here is the onnx graph: https://mythic.box.com/s/e3n6a0grvtvhgf5x37ve4c568n87ilnq
At my original ONNX graph, if I made https://github.com/PINTO0309/onnx2tf/blob/4e8f29129e6ea03f45f9d6520cc077522a790249/onnx2tf/utils/common_functions.py#L845:L858 to not extend the tensor if the input is a scalar, then the converted tflite graph would fuse the Mul into the fully connected layer and the layer has bias of one dimension as expected as well.
It's too strange... It will take some time to do a little research. I can only guess at this point, but it is possible that the tflite optimizer is doing something when it optimizes the next Mul, rather than a Gemm conversion issue.
Somehow, I could roughly guess what the optimizer was doing.
repro_new_onnx_model_v2.onnx.zip
repro_new_onnx_model_v2_1.onnx.zip
repro_new_onnx_model_v2_1_float32.tflite.zip
PS:
- Batch size
?but only when the batch size is changed to a fixed number such as1.- Without
-boption
onnx2tf -i repro_new_onnx_model_v2.onnx- With
-boption
onnx2tf -i repro_new_onnx_model_v2.onnx -b 1 - Without
Try it.
Fixes: https://github.com/PINTO0309/onnx2tf/releases/tag/1.19.10
Thanks for the quick fix. The fix won't work if the graph has a series of Mul or Div nodes followed by Gemm; both Mul and Div would be fused into Gemm and then would make bias of two dimensions.
Here is an ONNX file with Gemm -> Mul -> Div: https://mythic.box.com/s/i6ginfybruy1ibeg5pmzluxh6vgj9ye3
I understood. However, I am skeptical that we really need to automatically support that pattern in onnx2tf. I understand that a clean conversion would be great for everyone, but the two consecutive scalar Mul and Div should be able to be pre-fused before exporting to ONNX.
I don't know how you are generating your model, but wouldn't you just calculate 1.9921875 / 1.072122573852539 = 1.858171396 in advance?
This is a ONNX model which relies on ONNXRuntime to do graph optimization (including fusion) but ONNXRuntime doesn't export the optimized model as their optimization is done in memory only. I might be able to trace its pytorch model and doing the fusion there where the operators are separated for different quantization needs per hardware backends, although it becomes irrelevant for float32 in the final tflite model.
Is pre_explicit_broadcast extending the scalar tensor absolutely needed? explicit_broadcast does not broadcast the scalar tensor, so why pre_explicit_broadcast needs to do that? It's an easier change compared to pattern matching change.
why pre_explicit_broadcast needs to do that? It's an easier change compared to pattern matching change.
The pre_explicit_broadcast is not necessary if you want to adapt it to the convenience of your model only. I have already tested about 600 different model conversions, including a more complex 150,000 OP model. The dimensional conversion is not yet perfect, but so far I have determined that extending the dimension and then transposing and broadcasting is the best means of reducing the probability of error. The problem is not as simple as you think.
I'll look into the Mul -> Div, Mul -> Mul, Div -> Mul -> Div -> Mul, Mul -> Add, Add -> Mul, ... solution in a bit.
Idea note: search for all consecutive scalar operations behind Gemm, doesn't matter if it's 2 or a million, just search for the next scalar operation after Gemm. However, if the output has two branches, the process becomes too complicated, so the search ends there.
I think the scalar consecutive pattern searching method would unfortunately have some complexity.
I looked at the code change: https://github.com/PINTO0309/onnx2tf/commit/621bd27c332df916bcc625b2c801bed7822ce62e And it seems the original intent is not for scalar tensor, so perhaps a workaround is to have an extra user-specified option to not extend the tensor for Mul/Div nodes etc. But again, it's a user defined option which I understand it adds extra maintenance liability.
I'm fine to close the issue as is, and I will try to clean the graph on my end.
Just wanted to add that I also have an issue with multidimensional bias with this model.
ONNX:
TFLite:
Extrapolating from #577, I was able to change if (is_scalar_1 or is_scalar_2) and graph_node.i().op == 'Gemm': to if (is_scalar_1 or is_scalar_2) and graph_node.i().op == 'Gemm' or graph_node.o().op == 'Sigmoid': to produce 1D bias. This is just a hack, of course.
There are so many patterns to deal with that it seems quite difficult to make a mechanical decision. (I understand that it is a hack.)
I think it'd be almost as good to let the user specify the nodes they want to skip expansion. It could be the actual names of the nodes, or to be more powerful, it could accept a lambda that accepts a node and returns a boolean. Getting it right automatically every time is difficult, but it's much easier for a particular model.
I see. That might be a good idea.
-
Idea note If no change occurs in the shape of the input tensor and the output tensor after the operation, skip the broadcast regardless of whether the previous operation was
Gemmor not. It should be effective no matter how many arithmetic operations on scalar values are performed in succession.-
repro_new_onnx_model_v2_2.onnxrepro_new_onnx_model_v2_2.onnx.zip-
Without
-boptionONNX TFLite -
With
-b 1optionONNX TFLite
-
-
ViT-B-16__openai_partial.onnxViT-B-16__openai_partial_cut.onnx.zipONNX TFLite
-
Fix: https://github.com/PINTO0309/onnx2tf/releases/tag/1.24.1
Wow, thanks for fixing this!