supervision
supervision copied to clipboard
feat(annotators): enhance label annotators with frame boundary adjust…
🚀 Enhance label annotators with frame boundary adjustments and new base class
Description
This PR adds the ability to ensure labels stay within frame boundaries through a new ensure_in_frame parameter. When enabled, this functionality guarantees that text labels for bounding boxes near image edges remain visible by adjusting their position to fit within the frame.
The key improvements include:
- ✅ Text labels near edges now properly positioned within frame boundaries
- ✅ Implemented as an optional parameter (default:
Falseto maintain backward compatibility) - ✅ Works alongside existing
smart_positionfunctionality with complementary behavior
While there may be occasional label overlaps in very busy frames when both smart_position and ensure_in_frame are enabled, running the smart positioning algorithm first typically yields better results overall.
Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] This change requires a documentation update
How has this change been tested?
I tested this change with various image scenarios that have bounding boxes positioned near frame edges. The implementation was verified by:
- Comparing output images with and without the
ensure_in_frameparameter enabled - Testing cases with multiple objects near edges to ensure proper positioning
- Validating behavior when used in combination with
smart_position
Example test code:
import cv2
import numpy as np
import supervision as sv
from supervision.annotators.core import LabelAnnotator, BoxAnnotator
from PIL import Image, ImageDraw
import os
from typing import Optional, Tuple, List
def generate_mock_yolo_output(image_shape: Tuple[int, int, int]) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""
Generates mock bounding box detections, confidence scores, and class predictions
for a given image shape. The function creates a set of detections, including
one that covers the whole image.
Args:
image_shape (Tuple[int, int, int]): The shape of the image (height, width, channels).
This is used to determine the boundaries for the generated bounding boxes.
Returns:
Tuple[np.ndarray, np.ndarray, np.ndarray]: A tuple containing:
- A NumPy array of bounding boxes (N, 4), where N is the number of detections.
Each box is defined as [xmin, ymin, xmax, ymax].
- A NumPy array of confidence scores (N,).
- A NumPy array of class labels (N,).
"""
image_height, image_width, _ = image_shape
num_detections = 100
# Generate random bounding boxes
xmin = np.random.randint(0, image_width, num_detections)
ymin = np.random.randint(0, image_height, num_detections)
xmax = np.random.randint(xmin + 20, image_width + 50, num_detections)
ymax = np.random.randint(ymin + 20, image_height + 50, num_detections)
bounding_boxes = np.stack([xmin, ymin, xmax, ymax], axis=1).astype(np.float32)
# Add a box that covers the whole image
full_image_box = np.array([0, 0, image_width, image_height], dtype=np.float32).reshape(1, 4)
bounding_boxes = np.concatenate([bounding_boxes, full_image_box], axis=0)
num_detections += 1
# Generate random confidence scores
confidence_scores = np.random.uniform(0.5, 0.95, num_detections).astype(np.float32)
confidence_scores[-1] = 0.99 # High confidence for the full image box
# Generate random class labels
class_labels = np.random.randint(0, 2, num_detections).astype(np.int32)
class_labels[-1] = 0 # Assign a class to the full image box
return bounding_boxes, confidence_scores, class_labels
def process_image_with_supervision(
image: np.ndarray,
display_image: bool = True,
text_position: sv.Position = sv.Position.TOP_LEFT,
smart_position: bool = False,
detections: Optional[sv.Detections] = None,
) -> None:
"""
Processes an image by simulating YOLO detection and using Supervision to annotate it.
The function generates two annotated images (with and without `ensure_in_frame`)
and stacks them vertically, adding headers and white boundaries for clarity.
Args:
image (np.ndarray): The input image as a NumPy array in BGR format.
display_image (bool, optional): Flag to control whether to display the image.
If True, it attempts to display the image. If False, it saves the
image to a file. Defaults to True.
text_position (sv.Position, optional): The position of the text label
relative to the bounding box. Defaults to sv.Position.TOP_LEFT.
smart_position (bool, optional): Flag to enable smart position adjustment of labels
to keep them within the image frame. Defaults to False.
detections (sv.Detections, optional): Pre-calculated detections.
If provided, the function uses these detections instead of generating new ones.
Defaults to None.
Returns:
None (displays or saves the stacked annotated image).
"""
# 1. Simulate YOLO model output or use provided
if detections is None:
bounding_boxes, confidence_scores, class_labels = generate_mock_yolo_output(image.shape)
detections = sv.Detections(
xyxy=bounding_boxes,
confidence=confidence_scores,
class_id=class_labels,
)
# 2. Create annotators
box_annotator = BoxAnnotator(thickness=2)
class_names = ["car", "person"]
label_annotator_in_frame = LabelAnnotator(
text_scale=0.5,
text_thickness=1,
text_padding=5,
ensure_in_frame=True,
text_position=text_position,
smart_position=smart_position,
)
label_annotator_out_of_frame = LabelAnnotator(
text_scale=0.5,
text_thickness=1,
text_padding=5,
ensure_in_frame=False,
text_position=text_position,
smart_position=smart_position,
)
# 4. Annotate the image with the detections using both annotators.
annotated_image_in_frame = box_annotator.annotate(image.copy(), detections=detections)
labels_in_frame = [
f"{class_names[int(class_id)]} {confidence:.2f}" # Corrected f-string
for _, _, confidence, class_id, *_ in detections
]
annotated_image_in_frame = label_annotator_in_frame.annotate(
annotated_image_in_frame, detections=detections, labels=labels_in_frame
)
annotated_image_out_of_frame = box_annotator.annotate(image.copy(), detections=detections)
labels_out_of_frame = [
f"{class_names[int(class_id)]} {confidence:.2f}" # Corrected f-string
for _, _, confidence, class_id, *_ in detections
]
annotated_image_out_of_frame = label_annotator_out_of_frame.annotate(
annotated_image_out_of_frame, detections=detections, labels=labels_out_of_frame
)
# 5. Add white boundaries around the images
border_width = 3
annotated_image_in_frame = cv2.copyMakeBorder(
annotated_image_in_frame,
border_width,
border_width,
border_width,
border_width,
cv2.BORDER_CONSTANT,
value=(255, 255, 255),
)
annotated_image_out_of_frame = cv2.copyMakeBorder(
annotated_image_out_of_frame,
border_width,
border_width,
border_width,
border_width,
cv2.BORDER_CONSTANT,
value=(255, 255, 255),
)
# 6. Add headers to each image
header_height = 30
header_color = (255, 255, 255)
text_color = (0, 0, 0)
font = cv2.FONT_HERSHEY_SIMPLEX
font_scale = 0.7
font_thickness = 2
# Create header images for each annotated image
header_image_in_frame = np.zeros(
(header_height, annotated_image_in_frame.shape[1], 3), dtype=np.uint8
)
header_image_in_frame[:] = header_color
text_size_in_frame = cv2.getTextSize("Enabled", font, font_scale, font_thickness)[0]
text_x_in_frame = annotated_image_in_frame.shape[1] - text_size_in_frame[0] - 10
text_y_in_frame = (header_height + text_size_in_frame[1]) // 2
cv2.putText(
header_image_in_frame,
"Enabled",
(text_x_in_frame, text_y_in_frame),
font,
font_scale,
text_color,
font_thickness,
cv2.LINE_AA,
)
header_image_out_of_frame = np.zeros(
(header_height, annotated_image_out_of_frame.shape[1], 3), dtype=np.uint8
)
header_image_out_of_frame[:] = header_color
text_size_out_of_frame = cv2.getTextSize("Not Enabled", font, font_scale, font_thickness)[0]
text_x_out_of_frame = (header_image_out_of_frame.shape[1] - text_size_out_of_frame[0]) // 2
text_y_out_of_frame = (header_height + text_size_out_of_frame[1]) // 2
cv2.putText(
header_image_out_of_frame,
"Not Enabled",
(text_x_out_of_frame, text_y_out_of_frame),
font,
font_scale,
text_color,
font_thickness,
cv2.LINE_AA,
)
# Stack the headers and the images
annotated_image_in_frame_with_header = np.vstack(
(header_image_in_frame, annotated_image_in_frame)
)
annotated_image_out_of_frame_with_header = np.vstack(
(header_image_out_of_frame, annotated_image_out_of_frame)
)
# 7. Stack the two images vertically
stacked_image = np.vstack(
(annotated_image_in_frame_with_header, annotated_image_out_of_frame_with_header)
)
# Add position text to the top-left corner
cv2.putText(
stacked_image,
str(text_position) + f", smart_pos={smart_position}",
(10, 20),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 0, 0),
2,
cv2.LINE_AA,
)
# 8. Display the annotated image.
if display_image:
try:
pil_image = Image.fromarray(cv2.cvtColor(stacked_image, cv2.COLOR_BGR2RGB))
pil_image.show()
pil_image.close()
except OSError as e:
print(f"Error displaying image: {e}. Saving image instead.")
cv2.imwrite(f"annotated_image_{text_position}_smart_{smart_position}.jpg", stacked_image)
else:
cv2.imwrite(f"annotated_image_{text_position}_smart_{smart_position}.jpg", stacked_image)
print(f"Annotated image saved to annotated_image_{text_position}_smart_{smart_position}.jpg")
def main(image_path: str = "example.jpg") -> None:
"""
Main function to run the image processing and annotation with different label positions
and smart position settings.
Args:
image_path (str, optional): Path to the image file. Defaults to "example.jpg".
"""
# Create a dummy image
image = np.zeros((600, 800, 3), dtype=np.uint8)
cv2.imwrite(image_path, image)
# 1. Generate Detections once - Moved inside process_image_with_supervision
# mock_bounding_boxes, mock_confidence_scores, mock_class_labels = generate_mock_yolo_output(image.shape)
# detections = sv.Detections(
# xyxy=mock_bounding_boxes,
# confidence=confidence_scores,
# class_id=mock_class_labels,
# )
# 2. Loop through positions with smart_position=False
positions = [
sv.Position.TOP_LEFT,
sv.Position.CENTER_LEFT,
sv.Position.BOTTOM_RIGHT,
sv.Position.CENTER_RIGHT,
]
for position in positions:
print(f"Processing image with text position: {position}, smart_position=False")
process_image_with_supervision(image, display_image=False, text_position=position, smart_position=False) # Removed detections
# 3. Loop through positions with smart_position=True, using the same detections
for position in positions:
print(f"Processing image with text position: {position}, smart_position=True")
process_image_with_supervision(image, display_image=False, text_position=position, smart_position=True) # Removed detections
os.remove(image_path)
if __name__ == "__main__":
main()
Any specific deployment considerations
No special deployment considerations are needed. This feature is implemented as an optional parameter that defaults to False, ensuring backward compatibility with existing code.
Docs
- [ ] Docs updated? What were the changes:
No changes to docs as functionality is similar to
smart_positionand the only entry for this in the docs was in the changelog. I can update the documentation to include this new parameter in the appropriate class references if desired, just let me know where and the format.
Hello @hidara2000 thank you for this awesome PR
I made my first initials quick comments about certain change, Let me also test as well.
Makes sense. Changes ticked off. Cheers for a great tool!
Hi @hidara2000 👋🏻 Huge thanks for deciding to submit a PR to introduce this change! I have a couple of points I'd like to discuss before I dive deeper into the PR review:
Wouldn't it be a better approach to keep the smart_position flag and simply add this extra behavior when smart_position=True? I understand that these two features could be seen as separate operations, but I'm still leaning towards maintaining a simple API:
smart_position=False- raw, unprocessed label positionssmart_position=True- we do everything we can to make them as visible as possible
For some time now, I've wanted to add support for multiline labels / label wrapping. Considering you're completely rewriting both label annotators, would you be willing to add support for multiline labels / label wrapping as part of this PR?
📝 Add Multiline Text Support to Label Annotators
🔄 Updates to Previous PR
This extends my previous PR that added frame boundary adjustments by incorporating support for multiline text in label annotators. The implementation now properly handles both newlines in text and automatic text wrapping.
✨ New Features
- 🔤 Multiline Text Support: Labels now properly render text with newlines (
\n) - 📏 Auto Text Wrapping: New
max_line_lengthparameter controls automatic text wrapping - 🧠 Enhanced Smart Positioning: Improved algorithm to prevent overlapping multiline labels
- 🔄 Two-Phase Spreading: More effective label distribution with size-aware positioning
🛠️ Implementation Details
- Added
max_line_lengthparameter to existing annotator classes - Used Python's
textwraplibrary for robust text wrapping functionality - Enhanced smart positioning to better handle varying text box sizes
- Properly calculated dimensions for multiline text boxes
- Implemented size-aware box spreading to reduce overlaps
📊 Before/After Comparison
📚 Usage Example
# Create a label annotator with multiline text support
label_annotator = sv.LabelAnnotator(
text_padding=10,
smart_position=True, # Works with existing smart positioning
max_line_length=20 # Enable text wrapping at 20 characters
)
# Labels can have manual newlines or will auto-wrap
labels = [
"Car\nLicense: ABC-123", # Manual newlines
"This is a very long label that will be wrapped automatically" # Auto-wrapped
]
# Use as normal
annotated_image = label_annotator.annotate(
scene=image,
detections=detections,
labels=labels
)
🧪 Test Code
Here's the code I used to test the multiline text support:
def process_image_with_supervision(
image: np.ndarray,
display_image: bool = True,
text_position: sv.Position = sv.Position.TOP_LEFT,
smart_position: bool = False,
detections: Optional[sv.Detections] = None,
) -> None:
# 1. Simulate YOLO model output or use provided
if detections is None:
bounding_boxes, confidence_scores, class_labels = generate_mock_yolo_output(
image.shape
)
detections = sv.Detections(
xyxy=bounding_boxes,
confidence=confidence_scores,
class_id=class_labels,
)
# 2. Create annotators
box_annotator = BoxAnnotator(thickness=2)
class_names = ["This is\na\ncar", "This is a really really really long label"]
label_annotator_smart = LabelAnnotator(
text_scale=0.5,
text_thickness=1,
text_padding=5,
text_position=text_position,
smart_position=True,
max_line_length=12, # Enable text wrapping at 12 characters
)
label_annotator_not_smart = LabelAnnotator(
text_scale=0.5,
text_thickness=1,
text_padding=5,
text_position=text_position,
smart_position=False,
)
# 3. Annotate the image with both configurations
annotated_image_smart = box_annotator.annotate(image.copy(), detections=detections)
labels_smart = [
f"{class_names[int(class_id)]} {confidence:.2f}"
for _, _, confidence, class_id, *_ in detections
]
annotated_image_smart = label_annotator_smart.annotate(
annotated_image_smart, detections=detections, labels=labels_smart
)
annotated_image_not_smart = box_annotator.annotate(
image.copy(), detections=detections
)
labels_not_smart = [
f"{class_names[int(class_id)]} {confidence:.2f}"
for _, _, confidence, class_id, *_ in detections
]
annotated_image_not_smart = label_annotator_not_smart.annotate(
annotated_image_not_smart, detections=detections, labels=labels_not_smart
)
# 4. Create comparison image and save
# ... (display and saving code omitted for brevity)
I tested with various text positions:
positions = [
sv.Position.TOP_LEFT,
sv.Position.CENTER_LEFT,
sv.Position.BOTTOM_RIGHT,
sv.Position.CENTER_RIGHT,
]
for position in positions:
process_image_with_supervision(
image, display_image=False, text_position=position, smart_position=True
)
🔍 Performance Note
The enhanced smart positioning uses a two-phase approach that maintains good performance in most real-world scenarios. For scenes with many labels, the visual improvement in label placement is well worth the minimal additional processing time.
🔄 Compatibility
This change is backward compatible. The max_line_length parameter is optional (default: None), so existing code will continue to work without modification.
Hi @hidara2000, sorry it took me a while to get back to you. I'm currently juggling work across 3–4 repositories, so my time is a bit stretched. I’ve now gone through your PR carefully and you’ve done an excellent job—really impressive work! Don’t be discouraged by the number of comments I left—they’re all meant to help polish things up. Once we merge this PR, it’ll take Supervision’s text annotators to the next level!
Hi @hidara2000, sorry it took me a while to get back to you. I'm currently juggling work across 3–4 repositories, so my time is a bit stretched. I’ve now gone through your PR carefully and you’ve done an excellent job—really impressive work! Don’t be discouraged by the number of comments I left—they’re all meant to help polish things up. Once we merge this PR, it’ll take Supervision’s text annotators to the next level!
I appreciate you going through it, and I agree with all the comments. Changes made as per advice and results from test below.
Hi @hidara2000 👋🏻 thanks a lot for your contribution! It took us a lot of time but we finally merged the PR. I'll be included in our release tomorrow.
Hi @hidara2000 👋🏻 are you on LinkedIn? I'd like to mention / tag you in supervision-0.26.0 release post.
Thanks a lot for all the help and patience! 🙏🏻 I tagged you in supervision-0.26.0 release post.