yolov5 Inconsistent TFLite Model Results Between detect.py and Custom Inference Code

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Description:

Hi, I've converted a YOLOV5s model to a Tflite model using the export.py script provided in YOLOv5. The output came with the name (best-fp16.tflite). It works well when used with the official detect.py script. The predictions are accurate, with good bounding box alignment and confidence scores.

However, when I perform inference using my custom Python code, the results are noticeably worse:

Bounding boxes are misaligned.
Confidence scores are significantly lower.

here is the detect.py output test

here is the custom code output: detection_original

know the labels in the output are incorrect (e.g., "person") because I forgot to update coco.yaml, but the main issue lies in the quality of the detections.

the input shape is [1, 640, 640, 3] the output shape is [1, 25200, 7]

here is my custom code for detection:

`
import cv2
import numpy as np
import tensorflow as tf

# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="best-fp16.tflite")
interpreter.allocate_tensors()

# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Function to preprocess the image
def preprocess_image(image_path, input_size=(640, 640)):
    # Load the image
    image = cv2.imread(image_path)
    if image is None:
        raise ValueError("Image not found or could not be loaded")
    
    # Convert image to RGB
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Resize image to the model input size (640x640)
    image_resized = cv2.resize(image, input_size)
    
    # Normalize pixel values to [0, 1]
    image_resized = image_resized / 255.0

    # Add batch dimension (shape becomes [1, 640, 640, 3])
    input_image = np.expand_dims(image_resized, axis=0).astype(np.float32)
    
    return input_image, image  # Return processed image and original image

# Function to decode raw YOLOv5 output
def decode_output(output_data, input_size, confidence_threshold=0.2, iou_threshold=0.4):
    boxes = output_data[..., :4]  # Extract bounding box data
    confidence = output_data[..., 4]  # Extract confidence scores
    class_probs = output_data[..., 5:]  # Extract class probabilities

    # Compute final confidence scores
    scores = confidence[..., np.newaxis] * class_probs

    # Filter predictions based on confidence threshold
    valid_detections = np.where(scores.max(axis=-1) > confidence_threshold)
    boxes = boxes[valid_detections]
    scores = scores[valid_detections]
    class_ids = np.argmax(scores, axis=-1)
    
    # Scale boxes to input size (assumes input_size is square)
    input_h, input_w = input_size
    boxes[:, 0] *= input_w  # Scale x_center
    boxes[:, 1] *= input_h  # Scale y_center
    boxes[:, 2] *= input_w  # Scale width
    boxes[:, 3] *= input_h  # Scale height
    
    # Convert boxes from (x_center, y_center, width, height) to (xmin, ymin, xmax, ymax)
    boxes[:, 0] = boxes[:, 0] - boxes[:, 2] / 2  # xmin
    boxes[:, 1] = boxes[:, 1] - boxes[:, 3] / 2  # ymin
    boxes[:, 2] = boxes[:, 0] + boxes[:, 2]      # xmax
    boxes[:, 3] = boxes[:, 1] + boxes[:, 3]      # ymax

    # Perform Non-Maximum Suppression (NMS)
    nms_indices = tf.image.non_max_suppression(
        boxes,
        scores.max(axis=-1),
        max_output_size=50,  # Max number of detections
        iou_threshold=iou_threshold,
        score_threshold=confidence_threshold
    ).numpy()

    # Return final filtered boxes, scores, and class IDs
    return boxes[nms_indices], scores[nms_indices].max(axis=-1), class_ids[nms_indices]

# Function to draw bounding boxes on an image
def draw_boxes(image, boxes, scores, class_ids, input_size, rescale=False):
    image_draw = image.copy()
    if rescale:
        # Rescale boxes to input size
        original_h, original_w = image.shape[:2]
        boxes[:, [0, 2]] *= (original_w / input_size[0])  # Scale X coordinates
        boxes[:, [1, 3]] *= (original_h / input_size[1])  # Scale Y coordinates

    # Draw bounding boxes
    for box, score, class_id in zip(boxes, scores, class_ids):
        if score > 0.2:  # Debug: Lower threshold for easier testing
            xmin, ymin, xmax, ymax = box.astype(int)
            label = f"Class {int(class_id)}: {score:.2f}"
            
            # Draw bounding box
            cv2.rectangle(image_draw, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)
            
            # Draw label
            cv2.putText(image_draw, label, (xmin, ymin - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    return image_draw

## Function to run inference and draw bounding boxes
def predict_and_draw_boxes(image_path, interpreter, input_size=(640, 640)):
    # Preprocess the image
    input_image, original_image = preprocess_image(image_path, input_size)
    print("Input image shape:", input_image.shape)  # Debugging info
    print("Original image shape:", original_image.shape)  # Debugging info

    # Set input tensor
    interpreter.set_tensor(input_details[0]['index'], input_image)
    
    # Run inference
    interpreter.invoke()
    
    # Get output tensor
    output_data = interpreter.get_tensor(output_details[0]['index'])
    print("Output data shape:", output_data.shape)  # Debugging info
    
    # Decode output
    boxes, scores, class_ids = decode_output(output_data[0], input_size, confidence_threshold=0.2)

    # Draw bounding boxes on the original image
    original_boxes_image = draw_boxes(original_image, boxes, scores, class_ids, input_size, rescale=True)
    
    # Save the image with the bounding boxes
    original_output_path = "detection_original.jpg"
    cv2.imwrite(original_output_path, original_boxes_image)
    
    print(f"Image with detections saved as {original_output_path}")

# Run prediction and draw bounding boxes
predict_and_draw_boxes("test2.jpg", interpreter)
`

how can I get the same result as the detect.py, do I need to include any preprocessing for the image or postprocessing for the output?

I want to have the same result as the detect.py so that I can convert it to flutter and make detections from phones

Additional

No response

Dec 17 '24 18:12 yAlqubati

👋 Hello @yAlqubati, thank you for your interest in YOLOv5 🚀! It looks like you're experiencing differences between detect.py results and your custom inference script. This is a great question and an Ultralytics engineer will assist you soon!

For a 🐛 Bug Report, could you please confirm if detect.py and your custom code are using the same preprocessing and postprocessing logic? Differences here often lead to inconsistencies in results.

If not already done, providing a minimum reproducible example (MRE) with a simplified version of your test image, model, and code is immensely helpful for debugging.

Requirements

Ensure that you are using Python>=3.8.0 with all necessary libraries installed and that the environment is correctly set up with matching TensorFlow Lite configurations.

Additional Debugging Tips

Verify that both the detect.py script and your custom code are normalizing images in the same way (e.g., pixel value scaling to [0, 1], image resizing dimensions, etc.).
Check that the YOLOv5 model postprocessing steps, such as Non-Maximum Suppression (NMS), confidence thresholds, and class probability decoding, are consistent with YOLOv5's implementation.
Confirm that your input tensor shape ([1, 640, 640, 3]) matches what YOLOv5 expects. Ensure there are no deviations in input formats or scaling.

Once these items are cross-verified, aligning the results should be easier. An engineer will follow up shortly with further recommendations 😊.

Dec 17 '24 18:12 UltralyticsAssistant

@yAlqubati the difference in results between detect.py and your custom code is likely due to inconsistencies in image preprocessing and/or output postprocessing. YOLOv5's detect.py handles these details precisely, so you will need to align your custom code accordingly.

Suggestions:

Image Preprocessing:
- Ensure that the input image is normalized to [0, 1] and matches the TFLite model's expected input format.
- Verify that the input tensor is scaled appropriately using quantization parameters if the TFLite model is quantized (e.g., FP16 or INT8). You can check this by inspecting input_details[0]['quantization'].
Postprocessing:
- YOLOv5 outputs are in normalized (0-1) x_center, y_center, width, height format. Ensure you correctly scale and convert these to (xmin, ymin, xmax, ymax) coordinates relative to the original image dimensions.
- Use non-maximum suppression (NMS) parameters consistent with detect.py. The confidence threshold and IoU threshold in your code (confidence_threshold=0.2, iou_threshold=0.4) may differ from the defaults in detect.py.
Debugging:
- Compare the input tensor passed to the TFLite model in your custom code with the one in detect.py to ensure they are identical.
- Inspect the raw model outputs (before postprocessing) in both cases to confirm they match.

For reference, you can review the YOLOv5 TFLite inference example provided in the YOLOv5 TFLite Export Guide. It includes preprocessing and postprocessing steps that align with detect.py.

Let us know if you encounter further issues!

Dec 18 '24 04:12 pderrenger

Thanks for the suggestions! I've verified that the image preprocessing follows your steps, and the model is FP32, not quantized—would this be an issue?

For postprocessing, I’ve ensured correct scaling and NMS parameters (confidence_threshold=0.2, iou_threshold=0.4). The input tensor shape ([1, 640, 640, 3]) is the same in both the custom code and detect.py. However, I couldn't print the output shape in detect.py. Could you provide guidance on how to resolve this or any other steps I should check?

Let me know if there’s anything else I can try!

Dec 18 '24 12:12 yAlqubati

@yAlqubati you're on the right track, and the FP32 model should not cause issues. To inspect the output shape in detect.py, you can modify the script to print the output tensor shape after inference by adding a line like print(output.shape) (where output is the raw model output). Additionally, ensure your custom code applies YOLOv5's exact preprocessing and postprocessing steps, including the sigmoid activation on output values (if needed) and proper handling of anchor grids. If the issue persists, comparing raw outputs (before postprocessing) between detect.py and your code may help identify discrepancies. Let us know if further assistance is needed!

Dec 18 '24 16:12 pderrenger

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

Nov 23 '25 00:11 github-actions[bot]