mediapipe Run Model Maker Object Detection TFLite model inference directly

I've trained a model in mediapipe model maker.

I want to run inference through tensorflow directly, on python, so I can use a Coral Edge TPU. Since it's a TFLite Model this should be possible.

But I'm struggling to get proper outputs.

For input, I resize to 256x256. I've tried normalisation in [0,255], [0,1] and [-1,1].

Running the signature function returns a dictionary of {detection_boxes, detection_scores} Where shape(detection_boxes) = (1, num_boxes, 4) and shape(detection_scores) = (1, num_boxes, num_classes)

However, the values I'm getting for detection_boxes are unnormalised and frequently negative. I've tried searching the repo for how decoding is done, and expected pre-processing on input, but its hard to traverse this repo.

Is there a minimal example of how to perform inference directly, and decode model output? Failing that, what model input is expected and what format are the output detection_boxes and detection_score?

(Code: https://gist.github.com/DoctorDinosaur/be495b6065fff29f79ec11306dd89c3b)

May 02 '24 14:05 DoctorDinosaur

Seems like I need to transform the boxes with anchor values.

It's unclear from tensors_to_detections_calculator.cc how anchor values are calculated for Model-Maker models, so I'm taking them from the metadata.json generated by model-maker right now.

import json

with open("mediapipe/exported_model/metadata.json") as f:
    metadata = json.load(f)
    anchors = metadata["subgraph_metadata"][0]["custom_metadata"][0]["data"][
        "ssd_anchors_options"
    ]["fixed_anchors_schema"]["anchors"]
    # Convert list of dictionaries to array of arrays
    anchors = np.array(
        [
            [anchor["x_center"], anchor["y_center"], anchor["width"], anchor["height"]]
            for anchor in anchors
        ]
    )
    metadata = None

boxes = np.zeros_like(output["detection_boxes"])

scores = output["detection_scores"]

x_scale = 1
y_scale = 1
w_scale = 1
h_scale = 1

x_center = (
    output["detection_boxes"][:, :, :, 0] / x_scale * anchors[np.newaxis, :, 2].T
    + anchors[np.newaxis, :, 0].T
)
y_center = (
    output["detection_boxes"][:, :, :, 1] / y_scale * anchors[np.newaxis, :, 3].T
    + anchors[np.newaxis, :, 1].T
)

width = np.exp(output["detection_boxes"][:, :, :, 2] / w_scale) * anchors[np.newaxis, :, 2].T
height = np.exp(output["detection_boxes"][:, :, :, 3] / h_scale) * anchors[np.newaxis, :, 3].T

boxes[:, :, :, 0] = y_center - height / 2
boxes[:, :, :, 1] = x_center - width / 2
boxes[:, :, :, 2] = y_center + height / 2
boxes[:, :, :, 3] = x_center + width / 2

nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections = (
    tf.image.combined_non_max_suppression(
        boxes,
        scores,
        max_output_size_per_class=5,
        max_total_size=25,
        iou_threshold=0.2,
        score_threshold=0.5,
        clip_boxes=True,
    )
)

This seems to produce values much closer to the correct ones, but they're still wrong, there's some negative values. Although, plotting the NMS-ed result, I get some boxes in the right place. I assume I'm doing something wrong here, perhaps a shape error as I've made quite a mess of the arrays?

Again, is there a working example for decoding?

May 03 '24 20:05 DoctorDinosaur

Hi @DoctorDinosaur, we don't have any examples for running the TFLite model directly with custom NMS calculations. You will need to write custom code to implement this if you wish to go down this route.

Jun 04 '24 22:06 joezoug

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

Jul 10 '24 01:07 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

Jul 17 '24 01:07 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

Jul 17 '24 01:07 google-ml-butler[bot]

mediapipe mediapipe copied to clipboard

Run Model Maker Object Detection TFLite model inference directly

mediapipe
mediapipe copied to clipboard