MMdnn icon indicating copy to clipboard operation
MMdnn copied to clipboard

can't convert model from tensorflow to caffe with tf.layrers.batch_normalization

Open 2h4dl opened this issue 5 years ago • 7 comments

Platform (like ubuntu 16.04/win10): ubuntu 16.04 Python version: python 2.7 Source framework with version (like Tensorflow 1.4.1 with GPU): Tensorflow 1.12 Destination framework with version (like CNTK 2.3 with GPU): caffe Pre-trained model path (webpath or webdisk path):

Running scripts:

mmconvert -sf tensorflow -in model-lenet-17000.meta -iw model-lenet-17000 --dstNodeName dense2/outputs -df caffe -om caffe-lenet

Error Info:

TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/Switch].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/Switch].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm/Switch_1].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch_1].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm/Switch_2].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch_2].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch_3].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch_4].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm/Switch_1].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm_1/Switch_1].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm/Switch_2].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm_1/Switch_2].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm_1/Switch_3].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm_1/Switch_4].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm/Switch].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch].
Traceback (most recent call last):
  File "/usr/local/bin/mmconvert", line 11, in <module>
    sys.exit(_main())
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/_script/convert.py", line 102, in _main
    ret = convertToIR._convert(ir_args)
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/_script/convertToIR.py", line 115, in _convert
    parser.run(args.dstPath)
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/common/DataStructure/parser.py", line 22, in run
    self.gen_IR()
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/tensorflow/tensorflow_parser.py", line 424, in gen_IR
    func(current_node)
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/tensorflow/tensorflow_parser.py", line 800, in rename_FusedBatchNorm
    self.set_weight(source_node.name, 'scale', self.ckpt_data[scale.name])
KeyError: u'conv1/batch_normalization/gamma/read'

It can be succeed if I restore this model and save again. But I got something difference in processing conversion. It seems model dropped some parameters.

Source model conversion:

Parse file [model-lenet-17000.meta] with binary format successfully.
Tensorflow model file [model-lenet-17000.meta] loaded successfully.
Tensorflow checkpoint file [model-lenet-17000] loaded successfully. [36] variables loaded.

Re-saved model conversion:

Parse file [model-lenet.meta] with binary format successfully.
Tensorflow model file [model-lenet.meta] loaded successfully.
Tensorflow checkpoint file [model-lenet] loaded successfully. [14] variables loaded.

22 parameters dropped after re-save.

2h4dl avatar Apr 26 '19 05:04 2h4dl

Hi @2h4dl, would you mind uploading the model so I can check it to solve the problem?

rainLiuplus avatar Apr 28 '19 02:04 rainLiuplus

@rainLiuplus https://drive.google.com/file/d/1zizj49H0pdkWmEXshPaXI3qtUd7r3cbG/view?usp=sharing, plz check it.

2h4dl avatar Apr 28 '19 03:04 2h4dl

Hi @2h4dl, I think the first attempt failed because of tf.cond in the batch norm. It might be using the tf.contrib.layers.batch_norm. image

Similar problems also happen when others try to import the frozen graph in here.

So, what is the difference when you resave the model? How big is it? I think it is probably because of the batchnorm.

JiahaoYao avatar Apr 28 '19 05:04 JiahaoYao

Hi @JiahaoYao. You mean tf.layers.batch_normalization as same as tf.contrib.layers.batch_norm? After resave model, some variables lost, model is smaller than before. How to use batch norm in tensorflow to avoid it?

2h4dl avatar Apr 28 '19 05:04 2h4dl

Hi @2h4dl, if you resave the model and tf.cond's are eliminated, it is possible for your model to be smaller due to the paremeters in tf.cond. We have met with this kind of this before, as mentioned here. I think simply using tf.layer.batch_norm is safe.

JiahaoYao avatar Apr 28 '19 07:04 JiahaoYao

Hi @JiahaoYao. As you mentioned, after resave the model, tf.cond is gone. But still have a question, sorry about that. In this wiki, it says tf.cond exists in slim not tf.layer. But I trained this model with tf.layers.batch_normalization, is there something wrong with my model code. This is my code here:

weights = get_weights(w_shape, regualizer)
conv = conv2d(x_input, weights, padding)
norm = tf.layers.batch_normalization(conv, training=istrain)
conv_relu = activation(norm)

2h4dl avatar Apr 28 '19 07:04 2h4dl

Hi @2h4dl , I have also met this problem, how did you fixed finally?

xyl3902596 avatar Aug 14 '20 09:08 xyl3902596