tensorflow-onnx Turning off back-to-back optimizer does not disable fusing batch normalization layers into convolutional layers

Describe the bug Hi, I was converting CenterNet(CenterNet HourGlass104 512x512 from Tensorflow Object Detection API(https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) with the Back-to-back optimizer turned off to disable the batchnorm fusion into conv layers following https://github.com/onnx/tensorflow-onnx/issues/1702 . The problem is that even though the back-to-back optimizer is turned off the convolutions and batchnorms are still fused together. Where else optimization can occur? Using tensorflow=2.8.0, onnx=1.11.0, tf2onnx=1.9.3/1190aa and opset 15

May 05 '22 13:05 Mypathissional

The fusion logic of Conv and BatchNormalization do in back-to-back optimizer. Could you check if your model conversion process is passed through the below code? https://github.com/onnx/tensorflow-onnx/blob/c67bcfb580be741ece8d9978a9b57bd2ce7367ee/tf2onnx/optimizer/back_to_back_optimizer.py#L191

May 06 '22 07:05 hwangdeyu

@hwangdeyu After i turned off the back-to-back optimizer in the beginning of init file, I am printing a message both at the beginning of the _optimize_conv_batchnorm_fusion(g, node, consumer_nodes) and in optimize_graph(graph, catch_errors=True, optimizers=None) in optimizer init file. It is entering optimize_graph but not _optimize_conv_batchnorm_fusion and some kind of fusion is still happening because the node name is changed.

May 06 '22 11:05 Mypathissional

i guess the problem might be not in the fusing but in the type of the batchnormalization layer used which is SyncBatchNormalization. I have prepared a minimal example for it. For this code the exported model does not have the batchnorms

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
  net = tf.keras.Sequential()
  net.add(tf.keras.layers.Conv2D(2,3))
  net.add(tf.keras.layers.experimental.SyncBatchNormalization())
net.build((1,30,30,2))
net.save("~/Desktop/conv_block")

Can this be a problem?

May 06 '22 13:05 Mypathissional

Hi @Mypathissional , I think it's a expected behavior for tensorflow-onnx.

Cause when I do the convert script, there is no BatchNormalization op, even before running the optimizer conversion.

optimizer before: Counter({'Identity': 7, 'Const': 2, 'Transpose': 2, 'Placeholder': 1, 'Conv': 1, 'Mul': 1})
optimizer after: Counter({'Transpose': 2, 'Placeholder': 1, 'Const': 1, 'Conv': 1})

However, if we change tf.keras.layers.experimental.SyncBatchNormalization() to tf.keras.layers.BatchNormalization(), the op would be shown.

optimizer before: Counter({'Identity': 6, 'Const': 5, 'Transpose': 4, 'Placeholder': 1, 'Conv': 1, 'BatchNormalization': 1})

May 10 '22 09:05 hwangdeyu

@hwangdeyu Can you tell just for my understanding what happens when the operation that is present in the saved model but not present in the onnx operations? Is this operation just getting skipped?

May 10 '22 09:05 Mypathissional

@hwangdeyu Deyu Huang FTE Can you tell just for my understanding what happens when the operation that is present in the saved model but not present in the onnx operations? Is this operation just getting skipped?

I don't know how experimental.SyncBatchNormalization() works with deep implementation. From what I've seen so far, the op is not presented in save model neither. There is a FusedBatchNormV3 in tf.keras.layers.BatchNormalization() saved model ops:

 ['Placeholder', 'Const', 'Const', 'Const', 'Const', 'Const', 'Const', 'Identity', 'NoOp', 'NoOp', 'Conv2D', 'NoOp', 'Identity', 'FusedBatchNormV3', 'Identity', 'Identity', 'Identity']

But it's missing this op in tf.keras.layers.experimental.SyncBatchNormalization() saved model ops:

['Placeholder', 'Const', 'Const', 'Const', 'Const', 'Const', 'Const', 'Const', 'Const', 'Identity', 'NoOp', 'NoOp', 'Conv2D', 'NoOp', 'Identity', 'Mul', 'Identity', 'Identity', 'Identity', 'Identity']

May 11 '22 04:05 hwangdeyu

tensorflow-onnx tensorflow-onnx copied to clipboard

Turning off back-to-back optimizer does not disable fusing batch normalization layers into convolutional layers

tensorflow-onnx
tensorflow-onnx copied to clipboard