tensorflow-yolov4-tflite Here is a solution on how tu run yolov4 tflite-int8 model

Here is a solution on how tu run yolov4 tflite-int8 model

Open murdockhou opened this issue 3 years ago • 35 comments

Hi, thanks for your nicely work. As you say, Yolov4 and Yolov4-tiny int8 quantization have some issues. I will try to fix that. I have a solution to fix this, and below is the code:

First, when Convert darknet weights to tensorflow, we should not contain post process in saved_model:

# save_model.py

import tensorflow as tf
from absl import app, flags, logging
from absl.flags import FLAGS
from core.yolov4 import YOLO, decode, filter_boxes
import core.utils as utils
from core.config import cfg

flags.DEFINE_string('weights', './data/yolov4.weights', 'path to weights file')
flags.DEFINE_string('output', './checkpoints/yolov4-416', 'path to output')
flags.DEFINE_boolean('tiny', False, 'is yolo-tiny or not')
flags.DEFINE_integer('input_size', 416, 'define input size of export model')
flags.DEFINE_float('score_thres', 0.2, 'define score threshold')
flags.DEFINE_string('framework', 'tf', 'define what framework do you want to convert (tf, trt, tflite)')
flags.DEFINE_string('model', 'yolov4', 'yolov3 or yolov4')

def save_tf():
  STRIDES, ANCHORS, NUM_CLASS, XYSCALE = utils.load_config(FLAGS)

  input_layer = tf.keras.layers.Input([FLAGS.input_size, FLAGS.input_size, 3])
  feature_maps = YOLO(input_layer, NUM_CLASS, FLAGS.model, FLAGS.tiny)
  # bbox_tensors = []
  # prob_tensors = []
  # if FLAGS.tiny:
  #   for i, fm in enumerate(feature_maps):
  #     if i == 0:
  #       output_tensors = decode(fm, FLAGS.input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     else:
  #       output_tensors = decode(fm, FLAGS.input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     bbox_tensors.append(output_tensors[0])
  #     prob_tensors.append(output_tensors[1])
  # else:
  #   for i, fm in enumerate(feature_maps):
  #     if i == 0:
  #       output_tensors = decode(fm, FLAGS.input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     elif i == 1:
  #       output_tensors = decode(fm, FLAGS.input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     else:
  #       output_tensors = decode(fm, FLAGS.input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, FLAGS.framework)
  #     bbox_tensors.append(output_tensors[0])
  #     prob_tensors.append(output_tensors[1])
  # pred_bbox = tf.concat(bbox_tensors, axis=1)
  # pred_prob = tf.concat(prob_tensors, axis=1)
  # if FLAGS.framework == 'tflite':
  #   pred = (pred_bbox, pred_prob)
  # else:
  #   boxes, pred_conf = filter_boxes(pred_bbox, pred_prob, score_threshold=FLAGS.score_thres, input_shape=tf.constant([FLAGS.input_size, FLAGS.input_size]))
  #   pred = tf.concat([boxes, pred_conf], axis=-1)
  model = tf.keras.Model(input_layer, feature_maps)
  utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
  model.summary()
  model.save(FLAGS.output)

def main(_argv):
  save_tf()

if __name__ == '__main__':
    try:
        app.run(main)
    except SystemExit:
        pass

Then, run this command line convert this to tflite-int8:

python convert_tflite.py --weights ./checkpoints/yolov4-416 --output ./checkpoints/yolov4-416-int8.tflite --quantize_mode int8

I disable this param --dataset ./coco_dataset/coco/val207.txt since I do not have a coco dataset, also, you should to comment line 43

Finally, add some code in detect.py (actually they are pose process code we comment in the first step):

if FLAGS.framework == 'tflite':
        interpreter = tf.lite.Interpreter(model_path=FLAGS.weights)
        interpreter.allocate_tensors()
        input_details = interpreter.get_input_details()
        output_details = interpreter.get_output_details()
        print(input_details)
        print(output_details)
        interpreter.set_tensor(input_details[0]['index'], images_data)
        interpreter.invoke()
        pred = [interpreter.get_tensor(output_details[i]['index']) for i in range(len(output_details))]
        # add post process code here
        bbox_tensors = []
        prob_tensors = []
        for i, fm in enumerate(pred):
            if i == 0:
                output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
            elif i == 1:
                output_tensors = decode(pred[0], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
            else:
                output_tensors = decode(pred[1], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
            bbox_tensors.append(output_tensors[0])
            prob_tensors.append(output_tensors[1])
        pred_bbox = tf.concat(bbox_tensors, axis=1)
        pred_prob = tf.concat(prob_tensors, axis=1)
        pred = (pred_bbox, pred_prob)

        if FLAGS.model == 'yolov3' and FLAGS.tiny == True:
            boxes, pred_conf = filter_boxes(pred[1], pred[0], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))
        else:
            boxes, pred_conf = filter_boxes(pred[0], pred[1], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))

run python detect.py --weights ./checkpoints/yolov4-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite and we will get this result like this:

result-416-int8

Hope that can help you.

Aug 23 '20 13:08 murdockhou

Were you able to fully compile the model for the edge TPU? I was only able to get 39 supported ops + 8 that will run on the CPU for Yolo v4 Tiny.

Aug 25 '20 18:08 alexanderswerdlow

@alexanderswerdlow I only tried this on cpu, don't know how it compatiable with TPU.

Aug 26 '20 03:08 murdockhou

@alexanderswerdlow The total number of layers is only 47?

Aug 26 '20 05:08 hhk7734

@hhk7734 Yep. I’ll double check tomorrow but I’m almost positive. The only significant change I made was leakyRelu -> Relu. I was able to get it running on the edge TPU but it runs quite horribly, compared to even yolo v3 tiny. The performance is fine with ~20ms per frame but the accuracy is very poor.

Aug 26 '20 06:08 alexanderswerdlow

@alexanderswerdlow Can you share .tflite file? not _edgetpu.tflite I want to compare it with the layer I created.

Aug 26 '20 07:08 hhk7734

Can't share the tflite but happy to share my config. It's basically the stock config just without leakyRelu, 416x416, the right filter value, etc.

yolo-tiny-v4-obj.txt

Edit if you're curious, here's my edgetpu_compiler output:

Edge TPU Compiler version 14.1.317412892

Model compiled successfully in 599 ms.

Input model: v4.tflite
Input size: 5.72MiB
Output model: v4_edgetpu.tflite
Output size: 5.75MiB
On-chip memory used for caching model parameters: 4.82MiB
On-chip memory remaining for caching model parameters: 1.88MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 47
Operation log: v4_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 39
Number of operations that will run on CPU: 8

Operator                       Count      Status

PAD                            2          Mapped to Edge TPU
CONCATENATION                  1          More than one subgraph is not supported
CONCATENATION                  6          Mapped to Edge TPU
SPLIT                          3          Mapped to Edge TPU
MAX_POOL_2D                    3          Mapped to Edge TPU
DEQUANTIZE                     2          Operation is working on an unsupported data type
RESIZE_BILINEAR                1          Operation version not supported
QUANTIZE                       6          Mapped to Edge TPU
QUANTIZE                       1          Operation is otherwise supported, but not mapped due to some unspecified limitation
QUANTIZE                       1          More than one subgraph is not supported
CONV_2D                        19         Mapped to Edge TPU
CONV_2D                        2          More than one subgraph is not supported

Aug 27 '20 22:08 alexanderswerdlow

@alexanderswerdlow If you have time, I want you to see the below link. https://github.com/hhk7734/tensorflow-yolov4/blob/master/test/make_edgetpu_tflite.ipynb https://github.com/hhk7734/tensorflow-yolov4/issues/20

using input_size=416, inference time is ~60ms.

Aug 28 '20 10:08 hhk7734

@alexanderswerdlow how did you manage to convert the yolo model? I´m using the same config file as you and I managed to run the model with detect.py after quantizing it with python convert_tflite.py --weights ./checkpoints/yolov4 --output ./checkpoints/yolov4-int8.tflite --quantize_mode int8

But if I try to convert it with edgetpu_compiler ./checkpoints/yolov4-int8.tflite

I get

Edge TPU Compiler version 14.1.317412892
Invalid model: ./checkpoints/yolov4-int8.tflite
Model not quantized

I thought that running convert_tflite.py with --quantize_mode int8 takes care of this so I don´t really get what I´m missing.

Sep 01 '20 13:09 ItsMeTheBee

@alexanderswerdlow @murdockhou I would like to know this too. I've been trying to convert the weights of YOLOv3 and v4 to tflite ( fully int 8 quantized ) using these helpful steps but I didn't get a result sadly. I get the exact same result as @ItsMeTheBee above. I am a beginner and I would like to use this for a project. Any guidance would be extremely appreciated

Sep 04 '20 17:09 JimBratsos

Tel me plese, how to use this int8 model in Android\ios project? I have error in my Android project: Cannot copy from a TensorFlowLite tensor (Identity) with shape [1, 13, 13, 993] to a Java object with shape [1, 2535, 4].

With float16 and float32 models run is ok.

Sep 05 '20 11:09 4yougames

No idea. It does not compile if used at the coral compiler, as it needs quantization even though I already did it

Sep 06 '20 00:09 JimBratsos

Okay a little something about that:

I was able to compile the model after some modifications in core/common.py and convert_tflite.py In core/common.py I changed leaky relu to relu.

In convert_tflite.py I tried using the old converter:

elif FLAGS.quantize_mode == 'int8':
    converter.experimental_new_converter = False
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
    converter.allow_custom_ops = True
    converter.representative_dataset = representative_data_gen

With this the edgetpu_compiler can convert 17 ops:

Edge TPU Compiler version 14.1.317412892

Model compiled successfully in 340 ms.

Input model: checkpoints/yolov3-1800-int8.tflite
Input size: 8.42MiB
Output model: yolov3-1800-int8_edgetpu.tflite
Output size: 8.44MiB
On-chip memory used for caching model parameters: 1.84MiB
On-chip memory remaining for caching model parameters: 256.75KiB
Off-chip memory used for streaming uncached model parameters: 5.64MiB
Number of Edge TPU subgraphs: 1
Total number of operations: 135
Operation log: yolov3-1800-int8_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 17
Number of operations that will run on CPU: 118

Operator                       Count      Status

RESHAPE                        18         More than one subgraph is not supported
ADD                            6          More than one subgraph is not supported
MUL                            18         More than one subgraph is not supported
DEQUANTIZE                     8          Operation is working on an unsupported data type
CONV_2D                        2          More than one subgraph is not supported
CONV_2D                        11         Mapped to Edge TPU
CONCATENATION                  9          More than one subgraph is not supported
STRIDED_SLICE                  12         More than one subgraph is not supported
MAX_POOL_2D                    6          Mapped to Edge TPU
LOGISTIC                       12         More than one subgraph is not supported
QUANTIZE                       17         More than one subgraph is not supported
QUANTIZE                       7          Operation is otherwise supported, but not mapped due to some unspecified limitation
SPLIT_V                        2          Operation not supported
RESIZE_BILINEAR                1          Operation version not supported
EXP                            6          Operation is working on an unsupported data type

Now this was done for a tiny v3 model so it might not work for you, you can still give it a try though =) I´ve been hoping to get a better conversion performance so I´d still be glad about any hint in the right direction.

Sep 17 '20 12:09 ItsMeTheBee

Wow great work there. Have you tested it in Coral? What are the FPS @ItsMeTheBee?

EDIT: I tried following your steps, but adding this line converter.representative_dataset = representative_data_gen caused an error about min/max tensors. Any idea for a workaround? I renamed all the leaky_relu instances to relu and then changed the tf.nn.leaky_relu(conv, alpha=something) to tf.nn.relu(conv) at common.py. Also I made the changes specified in convert_tflite.py, except the line I told you, to avoid the error. After that, I tried to compile the tflite model and the compiler stated the model was not quantized

Sep 17 '20 13:09 JimBratsos

HI @murdockhou
After I used your method, I successfully implemented the YOLOV4-int8 function and successfully detected the kite.jpg But when using the yolov4-tiny-int8 version, the model was successfully converted, but an error occurred when using detect.py The error is as follows:

錯誤1

Tensorflow Version:tensorflow-gpu=2.3.0 / 2.3.0rc0 (All test) Code: python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

Sep 29 '20 10:09 w840401

Hey @w840401 , Ive had this error too, it occurs because you're expecting more outputs than your model already has. Our model has 2 output branches while the code expects 3. I sadly do not know what the solution is but I'd like it too @murdockhou

Sep 29 '20 13:09 JimBratsos

I naively thought it was caused by a OS problem~ The same error occurred when the windows was changed to Linux 😆 please somebudy help us. @murdockhou LINUX錯誤

Sep 30 '20 07:09 w840401

I tried using the coco dataset to convert to tflite, but I had empty min/max tensors. I really want to know how you got this to work @murdockhou . If as I said I omit the converter.representative_dataset = representative_data_gen line, I fail to quantize the model. The script runs, but if I compile with the coral model compiler, an error pops up saying The model is not quantized. Also, if I try to run the deepsort using this model, no bounding boxes are produced, which is weird. Could you help?

Oct 11 '20 15:10 JimBratsos

I tried using the coco dataset to convert to tflite, but I had empty min/max tensors. I really want to know how you got this to work @murdockhou . If as I said I omit the converter.representative_dataset = representative_data_gen line, I fail to quantize the model. The script runs, but if I compile with the coral model compiler, an error pops up saying The model is not quantized. Also, if I try to run the deepsort using this model, no bounding boxes are produced, which is weird. Could you help?

我成功解決您說的此問題，是因為val2017.txt 路徑問題你重新執行coco_dataset 路徑對，應該就沒問題了

Oct 12 '20 10:10 w840401

Hey @w840401 , I tried changing the path of val2017.txt, but it still has the same problem.

RuntimeError: Max and min for dynamic tensors should be recorded during calibration: Failed for tensor StatefulPartitionedCall/functional_1/zero_padding2d/Pad Empty min/max for tensor StatefulPartitionedCall/functional_1/zero_padding2d/Pad

Any ideas? My val2017.txt is at the same folder as the pictures, and points to them. For example

000000000139.jpg ..... for this specific picture.

Thanks for the help

Oct 12 '20 11:10 JimBratsos

1602642062232

@JimBratsos I don't think need at the same folder as the pictures Maybe can't points !

Quantification still has problems after this I did not choose to use tensorflow=2.3.0 I used tf-nighty~ Resource: https://pypi.org/project/tf-nightly/

My english is poor~ Sorry

Oct 14 '20 02:10 w840401

When using the yolov4-tiny-int8 version, and run the model with detect.py after quantizing it with

python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

I got this error too.

output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
IndexError: list index out of range

Then, I added additional code in detect.py , and I can successfully detect image now. The same as detectvideo.py.

        # add post process code here    
        bbox_tensors = []
        prob_tensors = []
        if FLAGS.tiny:
            for i, fm in enumerate(pred):
                if i == 0:
                    output_tensors = decode(pred[1], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                else:
                    output_tensors = decode(pred[0], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                bbox_tensors.append(output_tensors[0])
                prob_tensors.append(output_tensors[1])
        else:
            for i, fm in enumerate(pred):
                if i == 0:
                    output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                elif i == 1:
                    output_tensors = decode(pred[0], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                else:
                    output_tensors = decode(pred[1], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                bbox_tensors.append(output_tensors[0])
                prob_tensors.append(output_tensors[1])

        pred_bbox = tf.concat(bbox_tensors, axis=1)
        pred_prob = tf.concat(prob_tensors, axis=1)
        pred = (pred_bbox, pred_prob)

Although the yolov4-tiny-int8 model is successfully detected, it runs slowly, takes about 300ms for each image on CPU. Does anyone have suggestions about this?

Dec 29 '20 09:12 deep-rooteddz

@deep-rooteddz Thanks for sharing. I follow your step to modify my detect.py file and run, but the results went wrongs. It didn't get the similar results as the @murdockhou. Would you mind to show your inference results on kit.jpg? tks.

Jan 08 '21 07:01 zero90169

@deep-rooteddz Thanks for sharing. I follow your step to modify my detect.py file and run, but the results went wrongs. It didn't get the similar results as the @murdockhou. Would you mind to show your inference results on kit.jpg? tks.

@zero90169 I found that I can run @hunglc007 's code without problems, maybe he has fixed it. Of course, you can continue to use @murdockhou 's method. Using yolov4-tiny-int8.tflite to detect kite.jpg, the accuracy is the same.

If you don’t get the similar result, check config.py to confirm

YOLO.CLASSES = "./data/classes/coco.names"
TRAIN.ANNOT_PATH = "./data/dataset/val2017.txt"
TEST.ANNOT_PATH = "./data/dataset/val2017.txt"

And you may need to modify YOLO.ANCHORS_TINY to make it the same size as yolov4-tiny.weight(But not necessary, it will affect boundingbox size). e.g.

__C.YOLO.ANCHORS_TINY         = [10,14, 23,27, 37,58, 81,82, 135,169, 344,319]
# __C.YOLO.ANCHORS_TINY         = [23,27, 37,58, 81,82, 81,82, 135,169, 344,319]

Here is my yolov4-tiny-int8.tflite results on kite.jpg. result

Jan 09 '21 07:01 deep-rooteddz

@deep-rooteddz Thanks for your kindly and quickly reply. I expected to get as many boxes as the kite.jpg which is shown by @murdockhou but my int8.tflite inference result on kite.jpg is more similar to yours. Maybe it is caused by "yolov4" and "yolov4-tiny". I will try the @hunglc007 's code again!

Jan 11 '21 07:01 zero90169

@deep-rooteddz Thanks. Your solution works for me, like you I experience significant slowness in relation to FP32, it is around 300MS instead of 100MS inference time, have you been able to resolve this? I use tiny yolo v4

Feb 09 '21 05:02 ybloch

When using the yolov4-tiny-int8 version, and run the model with detect.py after quantizing it with

python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

I got this error too.

output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
IndexError: list index out of range

Then, I added additional code in detect.py , and I can successfully detect image now. The same as detectvideo.py.

        # add post process code here    
        bbox_tensors = []
        prob_tensors = []
        if FLAGS.tiny:
            for i, fm in enumerate(pred):
                if i == 0:
                    output_tensors = decode(pred[1], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                else:
                    output_tensors = decode(pred[0], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                bbox_tensors.append(output_tensors[0])
                prob_tensors.append(output_tensors[1])
        else:
            for i, fm in enumerate(pred):
                if i == 0:
                    output_tensors = decode(pred[2], input_size // 8, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                elif i == 1:
                    output_tensors = decode(pred[0], input_size // 16, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                else:
                    output_tensors = decode(pred[1], input_size // 32, NUM_CLASS, STRIDES, ANCHORS, i, XYSCALE, 'tflite')
                bbox_tensors.append(output_tensors[0])
                prob_tensors.append(output_tensors[1])

        pred_bbox = tf.concat(bbox_tensors, axis=1)
        pred_prob = tf.concat(prob_tensors, axis=1)
        pred = (pred_bbox, pred_prob)

Although the yolov4-tiny-int8 model is successfully detected, it runs slowly, takes about 300ms for each image on CPU. Does anyone have suggestions about this?

thanks, it works in my raspi 4, but why the int 8 model run slower than fp16 or 32?? I still not using any accelerator, full cpu. for int8 i got 1.3fps and fp16,32 almost similar at 2.3fps

Feb 16 '21 01:02 farhantandia

did anyone solve this issue? my int8 converted model does not work with the script,

python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny

and also not working in android app too (fp16 converted model works though)

when I tested with int8 converted model, this error message is shown

ValueError: Shapes (1, 13, 13) and (1, 26, 26) are incompatible

Apr 06 '21 08:04 mhyeonsoo

@deep-rooteddz

Thanks, it works !

May 18 '21 07:05 mipsan

@murdockhou @deep-rooteddz @mipsan Hi, everyone I am using @hunglc007 's orginal detect.py to detect my own yolov4-tiny-int8.tflite. Unfortunately, it shows the same error as @mhyeonsoo My command: python detect.py --weights ./checkpoints/yolov4-tiny-224-int8.tflite --size 224 --model yolov4 --image ./data/person.jpg --framework tflite --tiny and my environment: python 3.8 tf-nightly 2.6.0 Please give me some advice about the error, thks.

Jun 27 '21 11:06 KuoEuran

did anyone solve this issue? my int8 converted model does not work with the script,
python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny
and also not working in android app too (fp16 converted model works though)

when I tested with int8 converted model, this error message is shown
ValueError: Shapes (1, 13, 13) and (1, 26, 26) are incompatible

Have you solved it? I have encountered the same problem. Looking forward to your reply, thanks!

Dec 21 '21 09:12 Hanseyyyy

I'm also having problems trying to convert a YOLOv4 model to full int8 quantization, and I don't have an answer yet but I found a bug in the convert_tflite.py script that will always result in a model that is not fully int8 quantized. The lines:

  elif FLAGS.quantize_mode == 'int8':
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]

The second assignment to converter.target_spec.supported_ops supersedes the first that contains the int8 flag

May 22 '22 03:05 juandoso

did anyone solve this issue? my int8 converted model does not work with the script,
python detect.py --weights ./checkpoints/yolov4-tiny-416-int8.tflite --size 416 --model yolov4 --image ./data/kite.jpg --framework tflite --tiny
and also not working in android app too (fp16 converted model works though) when I tested with int8 converted model, this error message is shown
ValueError: Shapes (1, 13, 13) and (1, 26, 26) are incompatible
Have you solved it? I have encountered the same problem. Looking forward to your reply, thanks!

The reason is that you should add the post processing after invoking the model, since this was deleted from save_model.py script.

Aug 26 '22 22:08 UcefMountacer

Hello,

I have tried to run the int8 model on the raspberry pi4. I got this error :

RuntimeError: Select TensorFlow op(s), included in the given model, is(are) not supported by this interpreter. Make sure you apply/link the Flex delegate before inference. For the Android, it can be resolved by adding "org.tensorflow:tensorflow-lite-select-tf-ops" dependency. See instructions: https://www.tensorflow.org/lite/guide/ops_selectNode number 79 (FlexFusedBatchNormV3) failed to prepare.

EDIT:

The same with float 16

Aug 26 '22 22:08 UcefMountacer

thanks, it works in my raspi 4, but why the int 8 model run slower than fp16 or 32?? I still not using any accelerator, full cpu. for int8 i got 1.3fps and fp16,32 almost similar at 2.3fps

Hello, Can you confirm that the float16 model is faster on the raspberry pi ? Thanks

Aug 26 '22 22:08 UcefMountacer

tensorflow-yolov4-tflite tensorflow-yolov4-tflite copied to clipboard

Here is a solution on how tu run yolov4 tflite-int8 model

tensorflow-yolov4-tflite
tensorflow-yolov4-tflite copied to clipboard