PocketFlow
PocketFlow copied to clipboard
If still exec the compress process when inference?
I read the code in export_pb_tflite_models.py
. The following code compress the model, but I think these operation will save in the pb
file, and the new output x
rely on kernel_gthr
, kernel_shrk
and conv
operation. I want to know these operation will exec when inference?
kernel_gthr = np.zeros((1, 1, kernel_chn_in, nnzs.size))
kernel_gthr[0, 0, nnzs, np.arange(nnzs.size)] = 1.0
kernel_shrk = kernel[:, :, nnzs, :]
x = tf.nn.conv2d(op.inputs[0], kernel_gthr, [1, 1, 1, 1], 'SAME', data_format=data_format)
x = tf.nn.conv2d(
x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations)
kernel_gthr
and kernel_shrk
will be stored as constants, and x
relies on op.inputs[0]
, which is the input of the original convolutional layer.
thanks.
- Whether the model size will be reduced? Because the model add new variable, such as, kernel_gthr and kernel_shrk, and not delete the old variable.
- Whether the model calculation will be reduced? Because add new operation, such as
x = tf.nn.conv2d(op.inputs[0], kernel_gthr, [1, 1, 1, 1], 'SAME', data_format=data_format)
x = tf.nn.conv2d(
x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations)
- Old variables are deleted in *.pb & *.tflite models, since they are not used when computing the final outputs.
- If the kernel size is larger than 1 (e.g. 3x3 conv), the FLOPs will be reduced. For 1x1 conv, we recommend using the "tf.gather + 3 x 3 conv" scheme for graph transformation.
This is the pb file generated by export_pb_tflite_models.py
, that the pb file of compress model is bigger than the pb file of original model.
-rw-rw-r--. 1 zgz zgz 95082914 Jan 25 15:53 model_original.pb
-rw-rw-r--. 1 zgz zgz 104845311 Jan 25 15:53 model_transformed.pb
The model models_resnet_20_at_cifar_10.tar.gz
supported by PockerFlow generate pb file that the size of model_transformed.pb
is not half of the size of model_original.pb
.
-rw-rw-r--. 1 zgz zgz 1148712 Jan 25 18:14 model_original.pb
-rw-rw-r--. 1 zgz zgz 1028228 Jan 25 18:14 model_transformed.pb
Which model are you using?
I modify code beacuse report error ValueError: Cannot feed value of shape (1, 224, 224, 3) for Tensor 'import/net_input:0', which has shape '(32, 224, 224, 3)'
.
So I change net['input_data'] = np.zeros(tuple([1] + list(net['input_shape'][1:])), dtype=np.float32)
to net['input_data'] = np.zeros(list(net['input_shape']), dtype=np.float32)
.
The first result is to use my customresnet_v1_50
. The second is to use resnet_v1_20
provided by PocketFlow.
Which *.ckpt models are you using? Or, what is your --model_dir
FLAGS's value when calling export_pb_tflite_models.py
?
This is the log of first result. Through the log, we can know that the channel is pruned.
2019-01-25 18:30:51.087294: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-25 18:30:51.385927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:85:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-01-25 18:30:51.385963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:51.651604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:51.651645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-01-25 18:30:51.651654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-01-25 18:30:51.652008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:input: net_input:0 / output: net_output:0
INFO:tensorflow:input's shape: (32, 224, 224, 3)
INFO:tensorflow:output's shape: (32, 102)
2019-01-25 18:30:53.781431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:53.781483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:53.781493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-01-25 18:30:53.781505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-01-25 18:30:53.781708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:Froze 161 variables.
Converted 161 variables to const ops.
INFO:tensorflow:/home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/model_original.pb generated
2019-01-25 18:30:54.614066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:54.614113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:54.614121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-01-25 18:30:54.614128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-01-25 18:30:54.614337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:input: import/net_input:0 / output: import/net_output:0
INFO:tensorflow:outputs from the *.pb model: [[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
...
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]]
2019-01-25 18:30:57.127393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:57.127461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:57.127473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-01-25 18:30:57.127483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-01-25 18:30:57.127749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:input: net_input:0 / output: net_output:0
INFO:tensorflow:input's shape: (32, 224, 224, 3)
INFO:tensorflow:output's shape: (32, 102)
2019-01-25 18:30:59.110336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:30:59.110386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:30:59.110398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-01-25 18:30:59.110407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-01-25 18:30:59.110653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:transforming OP: model/resnet_v1_50/conv1/Conv2D
INFO:tensorflow:reducing 3 channels to 3
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 64 channels to 32
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 128 channels to 64
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 128 channels to 64
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 512 channels to 252
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 128 channels to 64
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 128 channels to 63
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 512 channels to 253
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 128 channels to 65
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 128 channels to 65
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 512 channels to 256
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 128 channels to 64
INFO:tensorflow:transforming OP: model/resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 128 channels to 65
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 512 channels to 261
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 128
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/Conv2D
INFO:tensorflow:reducing 512 channels to 255
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 509
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 128
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 513
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 130
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 505
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 130
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 129
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 516
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 128
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 512
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 256 channels to 127
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 1024 channels to 522
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 512 channels to 252
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 512 channels to 253
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/Conv2D
INFO:tensorflow:reducing 1024 channels to 513
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 2048 channels to 1019
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 512 channels to 258
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 512 channels to 260
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/Conv2D
INFO:tensorflow:reducing 2048 channels to 1019
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/Conv2D
INFO:tensorflow:reducing 512 channels to 257
INFO:tensorflow:transforming OP: model/resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/Conv2D
INFO:tensorflow:reducing 512 channels to 259
INFO:tensorflow:transforming OP: model/resnet_v1_50/logits/Conv2D
INFO:tensorflow:reducing 2048 channels to 2048
2019-01-25 18:31:08.059577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:31:08.059645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:31:08.059656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-01-25 18:31:08.059662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-01-25 18:31:08.060404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/best_model.ckpt
INFO:tensorflow:Froze 107 variables.
Converted 107 variables to const ops.
INFO:tensorflow:/home/zgz/project/save_model/pocketflow_test/uniform_pocketflow/best_model/model_transformed.pb generated
2019-01-25 18:31:09.272575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-01-25 18:31:09.272622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-25 18:31:09.272631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-01-25 18:31:09.272638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-01-25 18:31:09.272826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
INFO:tensorflow:input: import/net_input:0 / output: import/net_output:0
INFO:tensorflow:outputs from the *.pb model: [[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
...
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]
[0.00612008 0.01239587 0.00761339 ... 0.00812505 0.00783525 0.01395536]]
I use the model save in cp_best_path
.
For ResNet-50 model, the 1x1 conv in each bottleneck structure cannot be compressed with the "1x1 conv + 3x3 conv" scheme for graph transformation. For 1x1 conv, we recommend using the "tf.gather + 3 x 3 conv" scheme.
- The conv in residual brach will be compressed? and why the size of pb file become bigger?
- How to use the "tf.gather + 3 x 3 conv" scheme?
- The conv ops in the residual branch are compressed, but the size of *.pb file can sometimes be larger. Assume a 1x1 conv layer with 64 input channels is pruned to 32, then # of original parameters is 1*1*64*c (c: # of output channels). When using the "1x1 conv + 3x3 conv" scheme, this layer will be decomposed into two layers: one 1x1 conv with 64 input channels and 32 output channels and one 1x1 conv with 32 input channels and c output channels. So, the overall # of parameters is 1*1*64*32+1*1*32*c. This can be larger than the original one if c is no larger than 64.
- Are you using the latest version of PocketFlow? Or, can you find the following code in
export_pb_tflite_models.py
?
# replace channel pruned convolutional with cheaper operations
if graph_trans_mthd == 'gather':
x = tf.gather(op.inputs[0], nnzs, axis=1)
x = tf.nn.conv2d(
x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations)
elif graph_trans_mthd == '1x1_conv':
x = tf.nn.conv2d(op.inputs[0], kernel_gthr, [1, 1, 1, 1], 'SAME', data_format=data_format)
x = tf.nn.conv2d(
x, kernel_shrk, strides, padding, data_format=data_format, dilations=dilations)
else:
raise ValueError('unrecognized graph transformation method: ' + graph_trans_mthd)
sorry, my code is not the latest version. I will pull.
Also, take a look at this PR #119
I read the code in convert_data_format.py
. Can NHWC
format variable be imported directly into the NCHW
format model?
Yes, the layout of variables are the same.
I also use this part. After use such method, I found the new models_transformed.pb had more tensors like this
import/GatherV2/indices
import/GatherV2/axis
import/GatherV2
import/Conv2D/filter
import/Conv2D
import/pruned_model/resnet_model/initial_conv
import/pruned_model/resnet_model/batch_normalization/FusedBatchNorm
import/pruned_model/resnet_model/Relu
import/pruned_model/resnet_model/max_pooling2d/MaxPool
import/pruned_model/resnet_model/initial_max_pool
import/GatherV2_1/indices
import/GatherV2_1/axis
import/GatherV2_1
import/Conv2D_1/filter
import/Conv2D_1
import/pruned_model/resnet_model/batch_normalization_1/FusedBatchNorm
import/GatherV2_2/indices
import/GatherV2_2/axis
import/GatherV2_2
import/Conv2D_2/filter
import/Conv2D_2
import/pruned_model/resnet_model/batch_normalization_2/FusedBatchNorm
import/pruned_model/resnet_model/Relu_1
...
It can reduce the inference time. But if I use this pb to do next quantization by tensorRT, it will be more slower than before. Have you thought of such kind of situation like quantization after pruning?
-
The inference speed of the pruned model is
sample/sec: 72+
, The inference speed of the pruned model issample/sec: 53+
. This does not match the description in the PocketFlow documentation. Whencp_uniform_preserve_ratio
is 0.5, the inference speed ofpruned model
is 4 times that oforiginal model
. -
Can use
tf.gather(op.inputs[0], nnzs, axis=3)
inNHWC
format model?
@hzhyhx1117 We are considering completely rewrite the graph (conv layer and corresponding input/output layers) to achieve higher speed-up. But this is more difficult to be applicable to all models, since the topology can be very complicated.
@as754770178
tf.gather
can be used in NHWC
format models, but this is less efficient than that in NCHW
format models.