PocketFlow icon indicating copy to clipboard operation
PocketFlow copied to clipboard

If still exec the compress process when inference?

Open as754770178 opened this issue 6 years ago • 21 comments

I read the code in export_pb_tflite_models.py. The following code compress the model, but I think these operation will save in the pb file, and the new output x rely on kernel_gthr, kernel_shrk and conv operation. I want to know these operation will exec when inference?

      kernel_gthr = np.zeros((1, 1, kernel_chn_in, nnzs.size))
      kernel_gthr[0, 0, nnzs, np.arange(nnzs.size)] = 1.0
      kernel_shrk = kernel[:, :, nnzs, :] 
      x = tf.nn.conv2d(op.inputs[0], kernel_gthr, [1, 1, 1, 1], 'SAME', data_format=data_format)
      x = tf.nn.conv2d(
        x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations)

as754770178 avatar Jan 25 '19 03:01 as754770178

kernel_gthr and kernel_shrk will be stored as constants, and x relies on op.inputs[0], which is the input of the original convolutional layer.

jiaxiang-wu avatar Jan 25 '19 04:01 jiaxiang-wu

thanks.

  1. Whether the model size will be reduced? Because the model add new variable, such as, kernel_gthr and kernel_shrk, and not delete the old variable.
  2. Whether the model calculation will be reduced? Because add new operation, such as
      x = tf.nn.conv2d(op.inputs[0], kernel_gthr, [1, 1, 1, 1], 'SAME', data_format=data_format)
      x = tf.nn.conv2d(
        x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations)

as754770178 avatar Jan 25 '19 04:01 as754770178

  1. Old variables are deleted in *.pb & *.tflite models, since they are not used when computing the final outputs.
  2. If the kernel size is larger than 1 (e.g. 3x3 conv), the FLOPs will be reduced. For 1x1 conv, we recommend using the "tf.gather + 3 x 3 conv" scheme for graph transformation.

jiaxiang-wu avatar Jan 25 '19 04:01 jiaxiang-wu

This is the pb file generated by export_pb_tflite_models.py, that the pb file of compress model is bigger than the pb file of original model.

-rw-rw-r--. 1 zgz zgz  95082914 Jan 25 15:53 model_original.pb
-rw-rw-r--. 1 zgz zgz 104845311 Jan 25 15:53 model_transformed.pb

The model models_resnet_20_at_cifar_10.tar.gz supported by PockerFlow generate pb file that the size of model_transformed.pb is not half of the size of model_original.pb.

-rw-rw-r--. 1 zgz  zgz  1148712 Jan 25 18:14 model_original.pb
-rw-rw-r--. 1 zgz  zgz  1028228 Jan 25 18:14 model_transformed.pb

as754770178 avatar Jan 25 '19 06:01 as754770178

Which model are you using?

jiaxiang-wu avatar Jan 25 '19 06:01 jiaxiang-wu

I modify code beacuse report error ValueError: Cannot feed value of shape (1, 224, 224, 3) for Tensor 'import/net_input:0', which has shape '(32, 224, 224, 3)'. So I change net['input_data'] = np.zeros(tuple([1] + list(net['input_shape'][1:])), dtype=np.float32) to net['input_data'] = np.zeros(list(net['input_shape']), dtype=np.float32).

as754770178 avatar Jan 25 '19 06:01 as754770178

The first result is to use my customresnet_v1_50. The second is to use resnet_v1_20 provided by PocketFlow.

as754770178 avatar Jan 25 '19 06:01 as754770178

Which *.ckpt models are you using? Or, what is your --model_dir FLAGS's value when calling export_pb_tflite_models.py?

jiaxiang-wu avatar Jan 25 '19 06:01 jiaxiang-wu

This is the log of first result. Through the log, we can know that the channel is pruned.

2019-01-25 18:30:51.087294: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-25 18:30:51.385927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:85:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-01-25 18:30:51.385963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:51.651604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:51.651645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-01-25 18:30:51.651654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-01-25 18:30:51.652008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:input: net_input:0 / output: net_output:0
INFO:tensorflow:input's shape: (32, 224, 224, 3)
INFO:tensorflow:output's shape: (32, 102)
2019-01-25 18:30:53.781431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:53.781483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:53.781493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-01-25 18:30:53.781505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-01-25 18:30:53.781708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:Froze 161 variables.
Converted 161 variables to const ops.
INFO:tensorflow:/home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/model_original.pb generated
2019-01-25 18:30:54.614066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:54.614113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:54.614121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-01-25 18:30:54.614128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-01-25 18:30:54.614337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:input: import/net_input:0 / output: import/net_output:0
INFO:tensorflow:outputs from the *.pb model: [[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 ...
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]]
2019-01-25 18:30:57.127393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:57.127461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:57.127473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-01-25 18:30:57.127483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-01-25 18:30:57.127749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:input: net_input:0 / output: net_output:0
INFO:tensorflow:input's shape: (32, 224, 224, 3)
INFO:tensorflow:output's shape: (32, 102)
2019-01-25 18:30:59.110336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:59.110386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:59.110398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-01-25 18:30:59.110407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-01-25 18:30:59.110653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:transforming OP: model/resnet_v1_50/conv1/Conv2D
INFO:tensorflow:reducing 3 channels to 3
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 128 channels to 64
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 128 channels to 64
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 512 channels to 252
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 128 channels to 64
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 128 channels to 63
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 512 channels to 253
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 128 channels to 65
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 128 channels to 65
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 512 channels to 256
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 128 channels to 64
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 128 channels to 65
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 512 channels to 261
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 128
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/Conv2D
INFO:tensorflow:reducing 512 channels to 255
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 509
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 128
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 513
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 130
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 505
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 130
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 516
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 128
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 512
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 522
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 512 channels to 252
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 512 channels to 253
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/Conv2D
INFO:tensorflow:reducing 1024 channels to 513
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 2048 channels to 1019
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 512 channels to 258
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 512 channels to 260
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 2048 channels to 1019
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 512 channels to 257
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 512 channels to 259
INFO:tensorflow:transforming OP: model/resnet_v1_50/logits/Conv2D
INFO:tensorflow:reducing 2048 channels to 2048
2019-01-25 18:31:08.059577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:31:08.059645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:31:08.059656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-01-25 18:31:08.059662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-01-25 18:31:08.060404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:Froze 107 variables.
Converted 107 variables to const ops.
INFO:tensorflow:/home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/model_transformed.pb generated
2019-01-25 18:31:09.272575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:31:09.272622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:31:09.272631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-01-25 18:31:09.272638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-01-25 18:31:09.272826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:input: import/net_input:0 / output: import/net_output:0
INFO:tensorflow:outputs from the *.pb model: [[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 ...
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
 [0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]]

as754770178 avatar Jan 25 '19 06:01 as754770178

I use the model save in cp_best_path.

as754770178 avatar Jan 25 '19 06:01 as754770178

For ResNet-50 model, the 1x1 conv in each bottleneck structure cannot be compressed with the "1x1 conv + 3x3 conv" scheme for graph transformation. For 1x1 conv, we recommend using the "tf.gather + 3 x 3 conv" scheme.

jiaxiang-wu avatar Jan 25 '19 06:01 jiaxiang-wu

  1. The conv in residual brach will be compressed? and why the size of pb file become bigger?
  2. How to use the "tf.gather + 3 x 3 conv" scheme?

as754770178 avatar Jan 25 '19 06:01 as754770178

  1. The conv ops in the residual branch are compressed, but the size of *.pb file can sometimes be larger. Assume a 1x1 conv layer with 64 input channels is pruned to 32, then # of original parameters is 1*1*64*c (c: # of output channels). When using the "1x1 conv + 3x3 conv" scheme, this layer will be decomposed into two layers: one 1x1 conv with 64 input channels and 32 output channels and one 1x1 conv with 32 input channels and c output channels. So, the overall # of parameters is 1*1*64*32+1*1*32*c. This can be larger than the original one if c is no larger than 64.
  2. Are you using the latest version of PocketFlow? Or, can you find the following code in export_pb_tflite_models.py?
      # replace channel pruned convolutional with cheaper operations
      if graph_trans_mthd == 'gather':
        x = tf.gather(op.inputs[0], nnzs, axis=1)
        x = tf.nn.conv2d(
          x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations)
      elif graph_trans_mthd == '1x1_conv':
        x = tf.nn.conv2d(op.inputs[0], kernel_gthr, [1, 1, 1, 1], 'SAME', data_format=data_format)
        x = tf.nn.conv2d(
          x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations)
      else:
        raise ValueError('unrecognized graph transformation method: ' + graph_trans_mthd)

jiaxiang-wu avatar Jan 25 '19 06:01 jiaxiang-wu

sorry, my code is not the latest version. I will pull.

as754770178 avatar Jan 25 '19 06:01 as754770178

Also, take a look at this PR #119

jiaxiang-wu avatar Jan 25 '19 06:01 jiaxiang-wu

I read the code in convert_data_format.py. Can NHWC format variable be imported directly into the NCHW format model?

as754770178 avatar Jan 25 '19 07:01 as754770178

Yes, the layout of variables are the same.

jiaxiang-wu avatar Jan 25 '19 07:01 jiaxiang-wu

I also use this part. After use such method, I found the new models_transformed.pb had more tensors like this

import/GatherV2/indices
import/GatherV2/axis
import/GatherV2
import/Conv2D/filter
import/Conv2D
import/pruned_model/resnet_model/initial_conv
import/pruned_model/resnet_model/batch_normalization/FusedBatchNorm
import/pruned_model/resnet_model/Relu
import/pruned_model/resnet_model/max_pooling2d/MaxPool
import/pruned_model/resnet_model/initial_max_pool
import/GatherV2_1/indices
import/GatherV2_1/axis
import/GatherV2_1
import/Conv2D_1/filter
import/Conv2D_1
import/pruned_model/resnet_model/batch_normalization_1/FusedBatchNorm
import/GatherV2_2/indices
import/GatherV2_2/axis
import/GatherV2_2
import/Conv2D_2/filter
import/Conv2D_2
import/pruned_model/resnet_model/batch_normalization_2/FusedBatchNorm
import/pruned_model/resnet_model/Relu_1
...

It can reduce the inference time. But if I use this pb to do next quantization by tensorRT, it will be more slower than before. Have you thought of such kind of situation like quantization after pruning?

hzhyhx1117 avatar Jan 25 '19 09:01 hzhyhx1117

  1. The inference speed of the pruned model is sample/sec: 72+, The inference speed of the pruned model is sample/sec: 53+. This does not match the description in the PocketFlow documentation. When cp_uniform_preserve_ratio is 0.5, the inference speed of pruned model is 4 times that of original model.

  2. Can use tf.gather(op.inputs[0], nnzs, axis=3) in NHWC format model?

as754770178 avatar Jan 25 '19 09:01 as754770178

@hzhyhx1117 We are considering completely rewrite the graph (conv layer and corresponding input/output layers) to achieve higher speed-up. But this is more difficult to be applicable to all models, since the topology can be very complicated.

jiaxiang-wu avatar Feb 08 '19 01:02 jiaxiang-wu

@as754770178 tf.gather can be used in NHWC format models, but this is less efficient than that in NCHW format models.

jiaxiang-wu avatar Feb 08 '19 01:02 jiaxiang-wu