supervision icon indicating copy to clipboard operation
supervision copied to clipboard

feat(annotators): enhance label annotators with frame boundary adjust…

Open hidara2000 opened this issue 7 months ago • 6 comments
trafficstars

🚀 Enhance label annotators with frame boundary adjustments and new base class

Description

This PR adds the ability to ensure labels stay within frame boundaries through a new ensure_in_frame parameter. When enabled, this functionality guarantees that text labels for bounding boxes near image edges remain visible by adjusting their position to fit within the frame.

The key improvements include:

  • ✅ Text labels near edges now properly positioned within frame boundaries
  • ✅ Implemented as an optional parameter (default: False to maintain backward compatibility)
  • ✅ Works alongside existing smart_position functionality with complementary behavior

While there may be occasional label overlaps in very busy frames when both smart_position and ensure_in_frame are enabled, running the smart positioning algorithm first typically yields better results overall.

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] This change requires a documentation update

How has this change been tested?

I tested this change with various image scenarios that have bounding boxes positioned near frame edges. The implementation was verified by:

  1. Comparing output images with and without the ensure_in_frame parameter enabled
  2. Testing cases with multiple objects near edges to ensure proper positioning
  3. Validating behavior when used in combination with smart_position

Example test code:

import cv2
import numpy as np
import supervision as sv
from supervision.annotators.core import LabelAnnotator, BoxAnnotator
from PIL import Image, ImageDraw
import os
from typing import Optional, Tuple, List


def generate_mock_yolo_output(image_shape: Tuple[int, int, int]) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Generates mock bounding box detections, confidence scores, and class predictions
    for a given image shape.  The function creates a set of detections, including
    one that covers the whole image.

    Args:
        image_shape (Tuple[int, int, int]): The shape of the image (height, width, channels).
            This is used to determine the boundaries for the generated bounding boxes.

    Returns:
        Tuple[np.ndarray, np.ndarray, np.ndarray]: A tuple containing:
            - A NumPy array of bounding boxes (N, 4), where N is the number of detections.
              Each box is defined as [xmin, ymin, xmax, ymax].
            - A NumPy array of confidence scores (N,).
            - A NumPy array of class labels (N,).
    """
    image_height, image_width, _ = image_shape
    num_detections = 100
    
    # Generate random bounding boxes
    xmin = np.random.randint(0, image_width, num_detections)
    ymin = np.random.randint(0, image_height, num_detections)
    xmax = np.random.randint(xmin + 20, image_width + 50, num_detections)
    ymax = np.random.randint(ymin + 20, image_height + 50, num_detections)
    bounding_boxes = np.stack([xmin, ymin, xmax, ymax], axis=1).astype(np.float32)

    # Add a box that covers the whole image
    full_image_box = np.array([0, 0, image_width, image_height], dtype=np.float32).reshape(1, 4)
    bounding_boxes = np.concatenate([bounding_boxes, full_image_box], axis=0)
    num_detections += 1

    # Generate random confidence scores
    confidence_scores = np.random.uniform(0.5, 0.95, num_detections).astype(np.float32)
    confidence_scores[-1] = 0.99  # High confidence for the full image box

    # Generate random class labels
    class_labels = np.random.randint(0, 2, num_detections).astype(np.int32)
    class_labels[-1] = 0  # Assign a class to the full image box

    return bounding_boxes, confidence_scores, class_labels



def process_image_with_supervision(
    image: np.ndarray,
    display_image: bool = True,
    text_position: sv.Position = sv.Position.TOP_LEFT,
    smart_position: bool = False,
    detections: Optional[sv.Detections] = None,
) -> None:
    """
    Processes an image by simulating YOLO detection and using Supervision to annotate it.
    The function generates two annotated images (with and without `ensure_in_frame`)
    and stacks them vertically, adding headers and white boundaries for clarity.

    Args:
        image (np.ndarray): The input image as a NumPy array in BGR format.
        display_image (bool, optional): Flag to control whether to display the image.
            If True, it attempts to display the image. If False, it saves the
            image to a file. Defaults to True.
        text_position (sv.Position, optional): The position of the text label
            relative to the bounding box.  Defaults to sv.Position.TOP_LEFT.
        smart_position (bool, optional): Flag to enable smart position adjustment of labels
            to keep them within the image frame. Defaults to False.
        detections (sv.Detections, optional): Pre-calculated detections.
            If provided, the function uses these detections instead of generating new ones.
            Defaults to None.

    Returns:
        None (displays or saves the stacked annotated image).
    """
    # 1. Simulate YOLO model output or use provided
    if detections is None:
        bounding_boxes, confidence_scores, class_labels = generate_mock_yolo_output(image.shape)
        detections = sv.Detections(
            xyxy=bounding_boxes,
            confidence=confidence_scores,
            class_id=class_labels,
        )

    # 2. Create annotators
    box_annotator = BoxAnnotator(thickness=2)
    class_names = ["car", "person"]

    label_annotator_in_frame = LabelAnnotator(
        text_scale=0.5,
        text_thickness=1,
        text_padding=5,
        ensure_in_frame=True,
        text_position=text_position,
        smart_position=smart_position,
    )
    label_annotator_out_of_frame = LabelAnnotator(
        text_scale=0.5,
        text_thickness=1,
        text_padding=5,
        ensure_in_frame=False,
        text_position=text_position,
        smart_position=smart_position,
    )

    # 4. Annotate the image with the detections using both annotators.
    annotated_image_in_frame = box_annotator.annotate(image.copy(), detections=detections)
    labels_in_frame = [
        f"{class_names[int(class_id)]} {confidence:.2f}"  # Corrected f-string
        for _, _, confidence, class_id, *_ in detections
    ]
    annotated_image_in_frame = label_annotator_in_frame.annotate(
        annotated_image_in_frame, detections=detections, labels=labels_in_frame
    )

    annotated_image_out_of_frame = box_annotator.annotate(image.copy(), detections=detections)
    labels_out_of_frame = [
        f"{class_names[int(class_id)]} {confidence:.2f}"  # Corrected f-string
        for _, _, confidence, class_id, *_ in detections
    ]
    annotated_image_out_of_frame = label_annotator_out_of_frame.annotate(
        annotated_image_out_of_frame, detections=detections, labels=labels_out_of_frame
    )

    # 5. Add white boundaries around the images
    border_width = 3
    annotated_image_in_frame = cv2.copyMakeBorder(
        annotated_image_in_frame,
        border_width,
        border_width,
        border_width,
        border_width,
        cv2.BORDER_CONSTANT,
        value=(255, 255, 255),
    )
    annotated_image_out_of_frame = cv2.copyMakeBorder(
        annotated_image_out_of_frame,
        border_width,
        border_width,
        border_width,
        border_width,
        cv2.BORDER_CONSTANT,
        value=(255, 255, 255),
    )

    # 6. Add headers to each image
    header_height = 30
    header_color = (255, 255, 255)
    text_color = (0, 0, 0)
    font = cv2.FONT_HERSHEY_SIMPLEX
    font_scale = 0.7
    font_thickness = 2

    # Create header images for each annotated image
    header_image_in_frame = np.zeros(
        (header_height, annotated_image_in_frame.shape[1], 3), dtype=np.uint8
    )
    header_image_in_frame[:] = header_color
    text_size_in_frame = cv2.getTextSize("Enabled", font, font_scale, font_thickness)[0]
    text_x_in_frame = annotated_image_in_frame.shape[1] - text_size_in_frame[0] - 10
    text_y_in_frame = (header_height + text_size_in_frame[1]) // 2
    cv2.putText(
        header_image_in_frame,
        "Enabled",
        (text_x_in_frame, text_y_in_frame),
        font,
        font_scale,
        text_color,
        font_thickness,
        cv2.LINE_AA,
    )

    header_image_out_of_frame = np.zeros(
        (header_height, annotated_image_out_of_frame.shape[1], 3), dtype=np.uint8
    )
    header_image_out_of_frame[:] = header_color
    text_size_out_of_frame = cv2.getTextSize("Not Enabled", font, font_scale, font_thickness)[0]
    text_x_out_of_frame = (header_image_out_of_frame.shape[1] - text_size_out_of_frame[0]) // 2
    text_y_out_of_frame = (header_height + text_size_out_of_frame[1]) // 2
    cv2.putText(
        header_image_out_of_frame,
        "Not Enabled",
        (text_x_out_of_frame, text_y_out_of_frame),
        font,
        font_scale,
        text_color,
        font_thickness,
        cv2.LINE_AA,
    )

    # Stack the headers and the images
    annotated_image_in_frame_with_header = np.vstack(
        (header_image_in_frame, annotated_image_in_frame)
    )
    annotated_image_out_of_frame_with_header = np.vstack(
        (header_image_out_of_frame, annotated_image_out_of_frame)
    )

    # 7. Stack the two images vertically
    stacked_image = np.vstack(
        (annotated_image_in_frame_with_header, annotated_image_out_of_frame_with_header)
    )

    # Add position text to the top-left corner
    cv2.putText(
        stacked_image,
        str(text_position) + f", smart_pos={smart_position}",
        (10, 20),
        cv2.FONT_HERSHEY_SIMPLEX,
        0.7,
        (0, 0, 0),
        2,
        cv2.LINE_AA,
    )

    # 8. Display the annotated image.
    if display_image:
        try:
            pil_image = Image.fromarray(cv2.cvtColor(stacked_image, cv2.COLOR_BGR2RGB))
            pil_image.show()
            pil_image.close()
        except OSError as e:
            print(f"Error displaying image: {e}. Saving image instead.")
            cv2.imwrite(f"annotated_image_{text_position}_smart_{smart_position}.jpg", stacked_image)
    else:
        cv2.imwrite(f"annotated_image_{text_position}_smart_{smart_position}.jpg", stacked_image)
        print(f"Annotated image saved to annotated_image_{text_position}_smart_{smart_position}.jpg")



def main(image_path: str = "example.jpg") -> None:
    """
    Main function to run the image processing and annotation with different label positions
    and smart position settings.

    Args:
        image_path (str, optional): Path to the image file. Defaults to "example.jpg".
    """
    # Create a dummy image
    image = np.zeros((600, 800, 3), dtype=np.uint8)
    cv2.imwrite(image_path, image)

    # 1. Generate Detections once - Moved inside process_image_with_supervision
    # mock_bounding_boxes, mock_confidence_scores, mock_class_labels = generate_mock_yolo_output(image.shape)
    # detections = sv.Detections(
    #      xyxy=mock_bounding_boxes,
    #      confidence=confidence_scores,
    #      class_id=mock_class_labels,
    # )

    # 2. Loop through positions with smart_position=False
    positions = [
        sv.Position.TOP_LEFT,
        sv.Position.CENTER_LEFT,
        sv.Position.BOTTOM_RIGHT,
        sv.Position.CENTER_RIGHT,
    ]
    for position in positions:
        print(f"Processing image with text position: {position}, smart_position=False")
        process_image_with_supervision(image, display_image=False, text_position=position, smart_position=False)  # Removed detections

    # 3. Loop through positions with smart_position=True, using the same detections
    for position in positions:
        print(f"Processing image with text position: {position}, smart_position=True")
        process_image_with_supervision(image, display_image=False, text_position=position, smart_position=True)  # Removed detections

    os.remove(image_path)



if __name__ == "__main__":
    main()

image

image

image

image

Any specific deployment considerations

No special deployment considerations are needed. This feature is implemented as an optional parameter that defaults to False, ensuring backward compatibility with existing code.

Docs

  • [ ] Docs updated? What were the changes: No changes to docs as functionality is similar to smart_position and the only entry for this in the docs was in the changelog. I can update the documentation to include this new parameter in the appropriate class references if desired, just let me know where and the format.

hidara2000 avatar Apr 15 '25 08:04 hidara2000

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Apr 15 '25 08:04 CLAassistant

Hello @hidara2000 thank you for this awesome PR

I made my first initials quick comments about certain change, Let me also test as well.

Makes sense. Changes ticked off. Cheers for a great tool!

hidara2000 avatar Apr 15 '25 12:04 hidara2000

Hi @hidara2000 👋🏻 Huge thanks for deciding to submit a PR to introduce this change! I have a couple of points I'd like to discuss before I dive deeper into the PR review:

Wouldn't it be a better approach to keep the smart_position flag and simply add this extra behavior when smart_position=True? I understand that these two features could be seen as separate operations, but I'm still leaning towards maintaining a simple API:

  • smart_position=False - raw, unprocessed label positions
  • smart_position=True - we do everything we can to make them as visible as possible

For some time now, I've wanted to add support for multiline labels / label wrapping. Considering you're completely rewriting both label annotators, would you be willing to add support for multiline labels / label wrapping as part of this PR?

Screenshot 2025-03-31 at 12 22 16

SkalskiP avatar Apr 16 '25 08:04 SkalskiP

📝 Add Multiline Text Support to Label Annotators

🔄 Updates to Previous PR

This extends my previous PR that added frame boundary adjustments by incorporating support for multiline text in label annotators. The implementation now properly handles both newlines in text and automatic text wrapping.

✨ New Features

  • 🔤 Multiline Text Support: Labels now properly render text with newlines (\n)
  • 📏 Auto Text Wrapping: New max_line_length parameter controls automatic text wrapping
  • 🧠 Enhanced Smart Positioning: Improved algorithm to prevent overlapping multiline labels
  • 🔄 Two-Phase Spreading: More effective label distribution with size-aware positioning

🛠️ Implementation Details

  • Added max_line_length parameter to existing annotator classes
  • Used Python's textwrap library for robust text wrapping functionality
  • Enhanced smart positioning to better handle varying text box sizes
  • Properly calculated dimensions for multiline text boxes
  • Implemented size-aware box spreading to reduce overlaps

📊 Before/After Comparison

image

image

📚 Usage Example

# Create a label annotator with multiline text support
label_annotator = sv.LabelAnnotator(
    text_padding=10,
    smart_position=True,  # Works with existing smart positioning
    max_line_length=20  # Enable text wrapping at 20 characters
)

# Labels can have manual newlines or will auto-wrap
labels = [
    "Car\nLicense: ABC-123",  # Manual newlines
    "This is a very long label that will be wrapped automatically"  # Auto-wrapped
]

# Use as normal
annotated_image = label_annotator.annotate(
    scene=image,
    detections=detections,
    labels=labels
)

🧪 Test Code

Here's the code I used to test the multiline text support:

def process_image_with_supervision(
    image: np.ndarray,
    display_image: bool = True,
    text_position: sv.Position = sv.Position.TOP_LEFT,
    smart_position: bool = False,
    detections: Optional[sv.Detections] = None,
) -> None:
    # 1. Simulate YOLO model output or use provided
    if detections is None:
        bounding_boxes, confidence_scores, class_labels = generate_mock_yolo_output(
            image.shape
        )
        detections = sv.Detections(
            xyxy=bounding_boxes,
            confidence=confidence_scores,
            class_id=class_labels,
        )

    # 2. Create annotators
    box_annotator = BoxAnnotator(thickness=2)
    class_names = ["This is\na\ncar", "This is a really really really long label"]

    label_annotator_smart = LabelAnnotator(
        text_scale=0.5,
        text_thickness=1,
        text_padding=5,
        text_position=text_position,
        smart_position=True,
        max_line_length=12,  # Enable text wrapping at 12 characters
    )
    label_annotator_not_smart = LabelAnnotator(
        text_scale=0.5,
        text_thickness=1,
        text_padding=5,
        text_position=text_position,
        smart_position=False,
    )

    # 3. Annotate the image with both configurations
    annotated_image_smart = box_annotator.annotate(image.copy(), detections=detections)
    labels_smart = [
        f"{class_names[int(class_id)]} {confidence:.2f}"
        for _, _, confidence, class_id, *_ in detections
    ]
    annotated_image_smart = label_annotator_smart.annotate(
        annotated_image_smart, detections=detections, labels=labels_smart
    )

    annotated_image_not_smart = box_annotator.annotate(
        image.copy(), detections=detections
    )
    labels_not_smart = [
        f"{class_names[int(class_id)]} {confidence:.2f}"
        for _, _, confidence, class_id, *_ in detections
    ]
    annotated_image_not_smart = label_annotator_not_smart.annotate(
        annotated_image_not_smart, detections=detections, labels=labels_not_smart
    )

    # 4. Create comparison image and save
    # ... (display and saving code omitted for brevity)

I tested with various text positions:

positions = [
    sv.Position.TOP_LEFT,
    sv.Position.CENTER_LEFT,
    sv.Position.BOTTOM_RIGHT,
    sv.Position.CENTER_RIGHT,
]
for position in positions:
    process_image_with_supervision(
        image, display_image=False, text_position=position, smart_position=True
    )

🔍 Performance Note

The enhanced smart positioning uses a two-phase approach that maintains good performance in most real-world scenarios. For scenes with many labels, the visual improvement in label placement is well worth the minimal additional processing time.

🔄 Compatibility

This change is backward compatible. The max_line_length parameter is optional (default: None), so existing code will continue to work without modification.

hidara2000 avatar Apr 17 '25 00:04 hidara2000

Hi @hidara2000, sorry it took me a while to get back to you. I'm currently juggling work across 3–4 repositories, so my time is a bit stretched. I’ve now gone through your PR carefully and you’ve done an excellent job—really impressive work! Don’t be discouraged by the number of comments I left—they’re all meant to help polish things up. Once we merge this PR, it’ll take Supervision’s text annotators to the next level!

SkalskiP avatar Apr 23 '25 10:04 SkalskiP

Hi @hidara2000, sorry it took me a while to get back to you. I'm currently juggling work across 3–4 repositories, so my time is a bit stretched. I’ve now gone through your PR carefully and you’ve done an excellent job—really impressive work! Don’t be discouraged by the number of comments I left—they’re all meant to help polish things up. Once we merge this PR, it’ll take Supervision’s text annotators to the next level!

I appreciate you going through it, and I agree with all the comments. Changes made as per advice and results from test below.

image
image
image
image

hidara2000 avatar Apr 24 '25 00:04 hidara2000

Hi @hidara2000 👋🏻 thanks a lot for your contribution! It took us a lot of time but we finally merged the PR. I'll be included in our release tomorrow.

SkalskiP avatar Jul 15 '25 12:07 SkalskiP

Hi @hidara2000 👋🏻 are you on LinkedIn? I'd like to mention / tag you in supervision-0.26.0 release post.

SkalskiP avatar Jul 16 '25 08:07 SkalskiP

Thanks a lot for all the help and patience! 🙏🏻 I tagged you in supervision-0.26.0 release post.

SkalskiP avatar Jul 16 '25 13:07 SkalskiP