models icon indicating copy to clipboard operation
models copied to clipboard

Couldn't convert ssd_efficientdet_d0_512x512_coco17_tpu-8 model to int8 tflite model

Open xiang-burlington opened this issue 3 years ago • 5 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [Yes] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • [Yes] I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • [Yes] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/

2. Describe the bug

After training a model with config file "ssd_efficientdet_d0_512x512_coco17_tpu-8.config", executed the following steps:

  1. Successfully export tensorflow model and tflite model with models/research/object_detection/exporter_main_v2.py
  2. Successfully export detection SavedModel for tflite converstion with models/research/object_detection/export_tflite_graph_tf2.py.
  3. Successfully converted the saved model from step 2 to tflite model with the following python code, and the generated model worked fine:

converter = tf.lite.TFLiteConverter.from_saved_model(args.model_dir) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.inference_input_type = tf.float32 converter.inference_output_type = tf.float32 tflite_model = converter.convert()

However, when I tried to generate an int8 tflite model with following code, converter = tf.lite.TFLiteConverter.from_saved_model(args.model_dir) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8, tf.lite.OpsSet.TFLITE_BUILTINS] converter.inference_input_type = tf.float32 converter.inference_output_type = tf.float32 tflite_model = converter.convert()

it got a "core dumped" error message: /home/xiangdong/experiment/8bit_tensorflow/facessd/export/ssd_efficientdet_d0_512x512_coco17_tpu-8_1229_lite/saved_model 2022-01-03 16:38:49.121464: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-01-03 16:38:54.890965: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22834 MB memory: -> device: 0, name: TITAN RTX, pci bus id: 0000:1a:00.0, compute capability: 7.5 2022-01-03 16:38:54.893600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22834 MB memory: -> device: 1, name: TITAN RTX, pci bus id: 0000:1b:00.0, compute capability: 7.5 2022-01-03 16:38:54.895871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 22834 MB memory: -> device: 2, name: TITAN RTX, pci bus id: 0000:1d:00.0, compute capability: 7.5 2022-01-03 16:38:54.898263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 22834 MB memory: -> device: 3, name: TITAN RTX, pci bus id: 0000:1e:00.0, compute capability: 7.5 2022-01-03 16:38:54.900511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:4 with 22834 MB memory: -> device: 4, name: TITAN RTX, pci bus id: 0000:3d:00.0, compute capability: 7.5 2022-01-03 16:38:54.902890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:5 with 22834 MB memory: -> device: 5, name: TITAN RTX, pci bus id: 0000:3f:00.0, compute capability: 7.5 2022-01-03 16:38:54.905193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:6 with 22834 MB memory: -> device: 6, name: TITAN RTX, pci bus id: 0000:41:00.0, compute capability: 7.5 2022-01-03 16:38:54.907479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:7 with 22834 MB memory: -> device: 7, name: TITAN RTX, pci bus id: 0000:5e:00.0, compute capability: 7.5 2022-01-03 16:39:18.611278: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:363] Ignored output_format. 2022-01-03 16:39:18.611330: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:366] Ignored drop_control_dependency. 2022-01-03 16:39:18.611351: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:372] Ignored change_concat_input_ranges. 2022-01-03 16:39:18.612511: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /home/xiangdong/experiment/8bit_tensorflow/facessd/export/ssd_efficientdet_d0_512x512_coco17_tpu-8_1229_lite/saved_model 2022-01-03 16:39:18.849818: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve } 2022-01-03 16:39:18.849888: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /home/xiangdong/experiment/8bit_tensorflow/facessd/export/ssd_efficientdet_d0_512x512_coco17_tpu-8_1229_lite/saved_model 2022-01-03 16:39:19.723416: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle. 2022-01-03 16:39:22.092031: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: /home/xiangdong/experiment/8bit_tensorflow/facessd/export/ssd_efficientdet_d0_512x512_coco17_tpu-8_1229_lite/saved_model 2022-01-03 16:39:23.362740: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 4750232 microseconds. 2022-01-03 16:39:25.914092: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:237] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable. 2022-01-03 16:39:40.347446: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1962] Estimated count of arithmetic ops: 6.784 G ops, equivalently 3.392 G MACs

Estimated count of arithmetic ops: 6.784 G ops, equivalently 3.392 G MACs fully_quantize: 0, inference_type: 6, input_inference_type: 0, output_inference_type: 0 2022-01-03 16:42:13.951302: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1962] Estimated count of arithmetic ops: 6.784 G ops, equivalently 3.392 G MACs

Estimated count of arithmetic ops: 6.784 G ops, equivalently 3.392 G MACs ./train.sh: line 82: 371560 Segmentation fault (core dumped) python3 convert_savedmodel_to_tflite.py --width=512 --height=512 --model_dir=${input_dir} --tflite_filename=${output_dir}/tf270_${model_name}_${today}.tflite

3. Steps to reproduce

A. The .sh that used for training and model conversion: model_name=ssd_efficientdet_d0_512x512_coco17_tpu-8 today="1229"

rm -rf facessd/${model_name}${today}/model_dir python3 models/research/object_detection/model_main_tf2.py
--num_workers=8
--checkpoint_every_n=5000
--pipeline_config_path=config/${model_name}.config
--model_dir=facessd/${model_name}
${today}/model_dir
--alsologtostderr
--num_train_steps=300000
--sample_1_of_n_eval_examples=30

python3 models/research/object_detection/exporter_main_v2.py
--input_type image_tensor --pipeline_config_path config/${model_name}.config
--trained_checkpoint_dir facessd/${model_name}${today}/model_dir/
--output_directory facessd/export/${model_name}
${today}/

python3 models/research/object_detection/export_tflite_graph_tf2.py
--pipeline_config_path /home/experiment/8bit_tensorflow/facessd/export/${model_name}${today}/pipeline.config
--trained_checkpoint_dir facessd/export/${model_name}
${today}/checkpoint/
--output_directory facessd/export/${model_name}_${today}_lite/

output_dir=/home/experiment/8bit_tensorflow/facessd/export input_dir=${output_dir}/${model_name}${today}lite/saved_model python3 convert_savedmodel_to_tflite.py
--width=512
--height=512
--model_dir=${input_dir}
--tflite_filename=${output_dir}/tf270
${model_name}
${today}.tflite

B. The training config file:

model { ssd { inplace_batchnorm_update: true freeze_batchnorm: false num_classes: 1 add_background_class: false box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } encode_background_as_zeros: true anchor_generator { multiscale_anchor_generator { min_level: 3 max_level: 7 anchor_scale: 4.0 aspect_ratios: [1.0, 2.0, 0.5] scales_per_octave: 3 } } image_resizer { fixed_shape_resizer { height: 512 width: 512 } } box_predictor { weight_shared_convolutional_box_predictor { depth: 64 class_prediction_bias_init: -4.6 conv_hyperparams { force_use_bias: true activation: SWISH regularizer { l2_regularizer { weight: 0.00004 } } initializer { random_normal_initializer { stddev: 0.01 mean: 0.0 } } batch_norm { scale: true decay: 0.99 epsilon: 0.001 } } num_layers_before_predictor: 3 kernel_size: 3 use_depthwise: true } } feature_extractor { type: 'ssd_efficientnet-b0_bifpn_keras' bifpn { min_level: 3 max_level: 7 num_iterations: 3 num_filters: 64 } conv_hyperparams { force_use_bias: true activation: SWISH regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { scale: true, decay: 0.99, epsilon: 0.001, } } } loss { classification_loss { weighted_sigmoid_focal { alpha: 0.25 gamma: 1.5 } } localization_loss { weighted_smooth_l1 { } } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true normalize_loc_loss_by_codesize: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.5 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } } }

train_config: { batch_size: 16 sync_replicas: true startup_delay_steps: 0 replicas_to_aggregate: 8 use_bfloat16: true num_steps: 300000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { random_scale_crop_and_pad_to_square { output_size: 512 scale_min: 0.1 scale_max: 2.0 } } optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: 8e-2 total_steps: 300000 warmup_learning_rate: .001 warmup_steps: 2500 } } momentum_optimizer_value: 0.9 } use_moving_average: false } max_number_of_boxes: 100 unpad_groundtruth_tensors: false }

train_input_reader: { label_map_path: "tfrecord/head_label_map.pbtxt" tf_record_input_reader { input_path: "tfrecord/head_1024x512_train.tfrecord" } }

eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false batch_size: 1 }

eval_input_reader: { label_map_path: "tfrecord/head_label_map.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "tfrecord/head_1024x512_test.tfrecord" } }

graph_rewriter { quantization { delay: 2000 weight_bits: 8 activation_bits: 8 } }

4. Expected behavior

That we can convert the trained model to a int8 tflite model.

5. Additional context

Tried using relu6 as activation instead of SWISH, result was the same.

6. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • Mobile device name if the issue happens on a mobile device: PC
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): v2.7.0-rc1-69-gc256c071bb2 2.7.0
  • Python version: 3.8.5
  • Bazel version (if compiling from source): 3.4.1
  • GCC/Compiler version (if compiling from source): Ubuntu 9.3.0-17ubuntu1~20.04
  • CUDA/cuDNN version: Cuda 10.1, cudnn 8.1
  • GPU model and memory: Titan RTX 24G

xiang-burlington avatar Jan 03 '22 21:01 xiang-burlington