Couldn't convert ssd_efficientdet_d0_512x512_coco17_tpu-8 model to int8 tflite model

Open xiang-burlington opened this issue 3 years ago • 5 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[Yes] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[Yes] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[Yes] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/

2. Describe the bug

After training a model with config file "ssd_efficientdet_d0_512x512_coco17_tpu-8.config", executed the following steps:

Successfully export tensorflow model and tflite model with models/research/object_detection/exporter_main_v2.py
Successfully export detection SavedModel for tflite converstion with models/research/object_detection/export_tflite_graph_tf2.py.
Successfully converted the saved model from step 2 to tflite model with the following python code, and the generated model worked fine:

converter = tf.lite.TFLiteConverter.from_saved_model(args.model_dir) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.inference_input_type = tf.float32 converter.inference_output_type = tf.float32 tflite_model = converter.convert()

However, when I tried to generate an int8 tflite model with following code, converter = tf.lite.TFLiteConverter.from_saved_model(args.model_dir) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8, tf.lite.OpsSet.TFLITE_BUILTINS] converter.inference_input_type = tf.float32 converter.inference_output_type = tf.float32 tflite_model = converter.convert()

it got a "core dumped" error message: /home/xiangdong/experiment/8bit_tensorflow/facessd/export/ssd_efficientdet_d0_512x512_coco17_tpu-8_1229_lite/saved_model 2022-01-03 16:38:49.121464: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-01-03 16:38:54.890965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22834 MB memory: -> device: 0, name: TITAN RTX, pci bus id: 0000:1a:00.0, compute capability: 7.5 2022-01-03 16:38:54.893600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22834 MB memory: -> device: 1, name: TITAN RTX, pci bus id: 0000:1b:00.0, compute capability: 7.5 2022-01-03 16:38:54.895871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 22834 MB memory: -> device: 2, name: TITAN RTX, pci bus id: 0000:1d:00.0, compute capability: 7.5 2022-01-03 16:38:54.898263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 22834 MB memory: -> device: 3, name: TITAN RTX, pci bus id: 0000:1e:00.0, compute capability: 7.5 2022-01-03 16:38:54.900511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:4 with 22834 MB memory: -> device: 4, name: TITAN RTX, pci bus id: 0000:3d:00.0, compute capability: 7.5 2022-01-03 16:38:54.902890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:5 with 22834 MB memory: -> device: 5, name: TITAN RTX, pci bus id: 0000:3f:00.0, compute capability: 7.5 2022-01-03 16:38:54.905193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:6 with 22834 MB memory: -> device: 6, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5 2022-01-03 16:38:54.907479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:7 with 22834 MB memory: -> device: 7, name: TITAN RTX, pci bus id: 0000:5e:00.0, compute capability: 7.5 2022-01-03 16:39:18.611278: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:363] Ignored output_format. 2022-01-03 16:39:18.611330: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:366] Ignored drop_control_dependency. 2022-01-03 16:39:18.611351: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:372] Ignored change_concat_input_ranges. 2022-01-03 16:39:18.612511: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /home/xiangdong/experiment/8bit_tensorflow/facessd/export/ssd_efficientdet_d0_512x512_coco17_tpu-8_1229_lite/saved_model 2022-01-03 16:39:18.849818: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve } 2022-01-03 16:39:18.849888: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /home/xiangdong/experiment/8bit_tensorflow/facessd/export/ssd_efficientdet_d0_512x512_coco17_tpu-8_1229_lite/saved_model 2022-01-03 16:39:19.723416: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle. 2022-01-03 16:39:22.092031: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: /home/xiangdong/experiment/8bit_tensorflow/facessd/export/ssd_efficientdet_d0_512x512_coco17_tpu-8_1229_lite/saved_model 2022-01-03 16:39:23.362740: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 4750232 microseconds. 2022-01-03 16:39:25.914092: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:237] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable. 2022-01-03 16:39:40.347446: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1962] Estimated count of arithmetic ops: 6.784 G ops, equivalently 3.392 G MACs

Estimated count of arithmetic ops: 6.784 G ops, equivalently 3.392 G MACs fully_quantize: 0, inference_type: 6, input_inference_type: 0, output_inference_type: 0 2022-01-03 16:42:13.951302: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1962] Estimated count of arithmetic ops: 6.784 G ops, equivalently 3.392 G MACs

Estimated count of arithmetic ops: 6.784 G ops, equivalently 3.392 G MACs ./train.sh: line 82: 371560 Segmentation fault (core dumped) python3 convert_savedmodel_to_tflite.py --width=512 --height=512 --model_dir=${input_dir} --tflite_filename=${output_dir}/tf270_${model_name}_${today}.tflite

3. Steps to reproduce

A. The .sh that used for training and model conversion: model_name=ssd_efficientdet_d0_512x512_coco17_tpu-8 today="1229"

rm -rf facessd/${model_name}${today}/model_dir python3 models/research/object_detection/model_main_tf2.py
--num_workers=8
--checkpoint_every_n=5000
--pipeline_config_path=config/${model_name}.config
--model_dir=facessd/${model_name}${today}/model_dir
--alsologtostderr
--num_train_steps=300000
--sample_1_of_n_eval_examples=30

python3 models/research/object_detection/exporter_main_v2.py
--input_type image_tensor --pipeline_config_path config/${model_name}.config
--trained_checkpoint_dir facessd/${model_name}${today}/model_dir/
--output_directory facessd/export/${model_name}${today}/

python3 models/research/object_detection/export_tflite_graph_tf2.py
--pipeline_config_path /home/experiment/8bit_tensorflow/facessd/export/${model_name}${today}/pipeline.config
--trained_checkpoint_dir facessd/export/${model_name}${today}/checkpoint/
--output_directory facessd/export/${model_name}_${today}_lite/

output_dir=/home/experiment/8bit_tensorflow/facessd/export input_dir=${output_dir}/${model_name}${today}lite/saved_model python3 convert_savedmodel_to_tflite.py
--width=512
--height=512
--model_dir=${input_dir}
--tflite_filename=${output_dir}/tf270${model_name}${today}.tflite

B. The training config file:

model { ssd { inplace_batchnorm_update: true freeze_batchnorm: false num_classes: 1 add_background_class: false box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } encode_background_as_zeros: true anchor_generator { multiscale_anchor_generator { min_level: 3 max_level: 7 anchor_scale: 4.0 aspect_ratios: [1.0, 2.0, 0.5] scales_per_octave: 3 } } image_resizer { fixed_shape_resizer { height: 512 width: 512 } } box_predictor { weight_shared_convolutional_box_predictor { depth: 64 class_prediction_bias_init: -4.6 conv_hyperparams { force_use_bias: true activation: SWISH regularizer { l2_regularizer { weight: 0.00004 } } initializer { random_normal_initializer { stddev: 0.01 mean: 0.0 } } batch_norm { scale: true decay: 0.99 epsilon: 0.001 } } num_layers_before_predictor: 3 kernel_size: 3 use_depthwise: true } } feature_extractor { type: 'ssd_efficientnet-b0_bifpn_keras' bifpn { min_level: 3 max_level: 7 num_iterations: 3 num_filters: 64 } conv_hyperparams { force_use_bias: true activation: SWISH regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { scale: true, decay: 0.99, epsilon: 0.001, } } } loss { classification_loss { weighted_sigmoid_focal { alpha: 0.25 gamma: 1.5 } } localization_loss { weighted_smooth_l1 { } } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true normalize_loc_loss_by_codesize: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.5 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } }

train_config: { batch_size: 16 sync_replicas: true startup_delay_steps: 0 replicas_to_aggregate: 8 use_bfloat16: true num_steps: 300000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { random_scale_crop_and_pad_to_square { output_size: 512 scale_min: 0.1 scale_max: 2.0 } } optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: 8e-2 total_steps: 300000 warmup_learning_rate: .001 warmup_steps: 2500 } } momentum_optimizer_value: 0.9 } use_moving_average: false } max_number_of_boxes: 100 unpad_groundtruth_tensors: false }

train_input_reader: { label_map_path: "tfrecord/head_label_map.pbtxt" tf_record_input_reader { input_path: "tfrecord/head_1024x512_train.tfrecord" } }

eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false batch_size: 1 }

eval_input_reader: { label_map_path: "tfrecord/head_label_map.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "tfrecord/head_1024x512_test.tfrecord" } }

graph_rewriter { quantization { delay: 2000 weight_bits: 8 activation_bits: 8 } }

4. Expected behavior

That we can convert the trained model to a int8 tflite model.

5. Additional context

Tried using relu6 as activation instead of SWISH, result was the same.

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
Mobile device name if the issue happens on a mobile device: PC
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): v2.7.0-rc1-69-gc256c071bb2 2.7.0
Python version: 3.8.5
Bazel version (if compiling from source): 3.4.1
GCC/Compiler version (if compiling from source): Ubuntu 9.3.0-17ubuntu1~20.04
CUDA/cuDNN version: Cuda 10.1, cudnn 8.1
GPU model and memory: Titan RTX 24G

Jan 03 '22 21:01 xiang-burlington

models models copied to clipboard

Couldn't convert ssd_efficientdet_d0_512x512_coco17_tpu-8 model to int8 tflite model

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

models
models copied to clipboard