ultralytics icon indicating copy to clipboard operation
ultralytics copied to clipboard

Confusion about width, height, rotation of saved prediction output (x,y,w,h,r)

Open Petros626 opened this issue 11 months ago • 18 comments

Search before asking

  • [x] I have searched the Ultralytics YOLO issues and discussions and found no similar questions.

Question

Here https://github.com/ultralytics/ultralytics/issues/20441 I exported the predictions in my desired format: x,y,w,h,r and compared it with the GT's. What I observe is that y is not precise and length and width seems switched. Also the rotation outputs ([-pi/4, 3pi/4]) doesn't match.

From my understanding: YOLO width = length for e.g. 3D BBox YOLO height = width "" ""

Car (GT: 1, Pred: 1):

Match 1 (distance: 1.41m):
Ground Truth:
  type: Car
  location (x,y,z): (-5.58, 1.77, 35.37)
  dimensions (h,w,l): (1.45, 1.62, 3.83)
  rotation_y: 1.52rad (87.1°)

Nearest Prediction:
  type: Car
  location (x,y,z): (-5.60, 0.36, 35.43)
  dimensions (h,w,l): (0.00, 4.35, 2.11)
  rotation_y: -1.59rad (-91.1°)

Can you tell me, if w and h must be switched everytime or why the predictions doesn't seem right. Additionally why y-coord. is so different.

Thank you so much!

Additional

Predicted labels: 1 263.897 282.975 21.0666 43.4668 0.0148887 0.881846 2 331.346 253.133 8.82084 9.56901 -0.00208643 0.00446377 2 370.42 207.72 10.1142 10.5897 0.0349597 0.00217024 2 203.962 431.016 9.02832 10.6538 0.00160701 0.00170111

Predicted labels in camera frame: Car 0.0 0 0.0 0.0 0.00 4.35 2.11 -5.60 0.36 35.43 -1.59 0.881846 Pedestrian 0.0 0 0.0 0.0 0.00 0.96 0.88 1.14 0.32 38.41 -1.57 0.00446377 Pedestrian 0.0 0 0.0 0.0 0.00 1.06 1.01 5.05 0.32 42.95 -1.61 0.00217024 Pedestrian 0.0 0 0.0 0.0 0.00 1.07 0.90 -11.60 0.27 20.63 -1.57 0.00170111

GT's (camera frame: Car 0.0 0 1.68 32.68 1.45 1.62 3.83 -5.58 1.77 35.37 1.52 -1.0 Pedestrian 0.0 0 1.13 30.63 1.74 0.47 0.55 5.15 1.51 42.82 1.24 -1.0

Petros626 avatar May 03 '25 15:05 Petros626

👋 Hello @Petros626, thank you for reaching out and providing detailed context! 🚀 This is an automated response to help you get started—an Ultralytics engineer will be with you soon to assist further.

We recommend checking our Docs for guidance on output formats and predictions, including Python and CLI usage. Many common questions are addressed there.

If you believe this is a 🐛 bug, please provide a minimum reproducible example (MRE) so we can investigate efficiently.

For custom training or prediction output questions, include as much detail as possible:

  • Dataset image examples
  • Full prediction and ground truth samples
  • Your export and evaluation code
  • Training logs or settings used

Make sure you are following our Tips for Best Training Results.

Join the Ultralytics community for further discussion or troubleshooting:

Upgrade

Before proceeding, please ensure that you are running the latest ultralytics package and all requirements in a Python>=3.8 environment with PyTorch>=1.8:

pip install -U ultralytics

Environments

YOLO is verified in the following up-to-date environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. Our CI checks verify correct operation of all YOLO Modes and Tasks every 24 hours and on each commit across macOS, Windows, and Ubuntu.

Thank you for your patience and for helping us improve Ultralytics!

UltralyticsAssistant avatar May 03 '25 15:05 UltralyticsAssistant

Seems like a bug with your modification because Ultralytics only returns angles between 0-90 degrees and it follows the OpenCV definition. If you modified it to follow a different definition, then you would need to enforce that accordingly.

Y-T-G avatar May 03 '25 15:05 Y-T-G

I saved the predictions with save_txt and save_conf and used this line here

line = (c, *(d.xywhr.view(-1) if is_obb else d.xyxyxyxyn.view(-1)))

The post-processing for val mode in head.py is still the default one (-45° -135°)

angle = (angle.sigmoid() - 0.25) * math.pi
  1. When you look to Predicted labels the width is always smaller than height, but shouldn't the width especially for Car be always greater? How is the ultralytics yolo convention?

  2. The wrong rotation is a miracle for me, bc I didn't changed anything in the source code.

Petros626 avatar May 03 '25 15:05 Petros626

Ultralytics uses OpenCV definition. The width and height have no relationship. Either of them may be greater than the other. Height is just what's vertical and width is horizontal. And the angle is the clockwise rotation with respect to positive x-axis.

https://github.com/ultralytics/ultralytics/issues/19642#issuecomment-2714931060

Y-T-G avatar May 03 '25 20:05 Y-T-G

For learning the obb you use the opencv convention, but for the validation (model.val()) the output of sigmoid [-45°,135°] without postprocessing [0°,90°], only for the mode predicted the opencv convention applies again.

According to your link, the angle should therefore be interpreted differently from the gt rotation according to the opencv convention. This means that the raw rotational output fits on my Gt labels, even if they are different.

Petros626 avatar May 04 '25 11:05 Petros626

The confusion is understandable. Let me clarify the OBB (Oriented Bounding Box) conventions in Ultralytics:

  1. Regarding width and height: Ultralytics follows OpenCV's convention where width and height are defined by the oriented box itself, not object semantics. This means width and height have no inherent relationship with the object's actual dimensions (like a car being longer than wide). The width and height in the model's output correspond to the oriented rectangle's dimensions.

  2. For rotation angles: During training/validation, the raw network output range is indeed [-π/4, 3π/4] as you noted from angle = (angle.sigmoid() - 0.25) * math.pi. However, when storing results or converting to other formats (using functions like xywhr2xyxyxyxy from ultralytics.utils.ops), these angles get regularized to the [0, π/2] range.

  3. The y-coordinate discrepancy is probably related to coordinate system differences between your 3D world space and the 2D image space where YOLO operates.

When comparing with your ground truth, you'll need to account for these conventions and potentially transform between coordinate systems properly. You might need to apply a mapping between your ground truth angles and the OpenCV convention angles.

glenn-jocher avatar May 05 '25 09:05 glenn-jocher

To answer 1: Yes this is correct.

To answer 2: In my case xyxyxyxy2xywhr is used to save predictions (wanted preds as xywhr format), probably related to the changed line (https://github.com/ultralytics/ultralytics/issues/20472#issuecomment-2848676988).

So finally I understand the Ultralytics (OpenCV) convention as @Y-T-G mentioned uses clockwise rotation:

0-------------------> x (0 rad)
|  A-------------B
|  |             |
|  |     box     h
|  |   angle=0   |
|  D------w------C
v
y (pi/2 rad)

Petros626 avatar May 05 '25 10:05 Petros626

Ultralytics uses OpenCV definition. The width and height have no relationship. Either of them may be greater than the other. Height is just what's vertical and width is horizontal. And the angle is the clockwise rotation with respect to positive x-axis.

#19642 (comment)

when CW rotation from 0°->90° is the r prediction output of the model, why some rotation angles are negative (e.g. -29°)? This returns negative values angle_rad = d.xywhr.view(-1)[4].item()

Petros626 avatar May 06 '25 12:05 Petros626

Probably a badly trained model

Y-T-G avatar May 06 '25 15:05 Y-T-G

does the OpenCV definition in xyxyxyxyxy2xywhr not ensure consistency?

Workaround valid?:

def normalize_angle_pred(angle):
    """Normalize angle to [0, pi/2] range."""
    while angle > pi/2:
        angle -= pi/2
    while angle < 0:
        angle += pi/2
    return angle

Petros626 avatar May 06 '25 15:05 Petros626

You can use regularize_rboxes if you want to force them to be between 0-90

https://github.com/ultralytics/ultralytics/blob/ac73bc4c36c4e210ab619d695eb97335c5380697/ultralytics/utils/ops.py#L805

Y-T-G avatar May 06 '25 19:05 Y-T-G

I would like to understand the origin of the problem.

Petros626 avatar May 06 '25 20:05 Petros626

@Petros626 The issue is in the different stages of angle handling in the OBB implementation:

  1. During model forward pass, angles are produced in the range [-π/4, 3π/4] as seen in head.py:

    angle = (angle.sigmoid() - 0.25) * math.pi  # [-pi/4, 3pi/4]
    
  2. When using d.xywhr directly to access raw prediction values, you're getting these unregularized angles, which is why you see negative values.

  3. Normally, these angles get regularized to [0, π/2] through regularize_rboxes during post-processing, which happens when results are saved or displayed. This function handles angle normalization by:

    t = t % (math.pi / 2)  # regularized boxes
    

The correct solution is to use regularize_rboxes from ultralytics.utils.ops on your results rather than creating a custom normalization function. This ensures consistency with Ultralytics' implementation and handles edge cases like aspect ratio swapping when angles exceed π/2.

from ultralytics.utils.ops import regularize_rboxes
regularized_boxes = regularize_rboxes(raw_boxes)  # raw_boxes in xywhr format

glenn-jocher avatar May 07 '25 03:05 glenn-jocher

after the forward pass in head.py the xyxyxyxy2xywhr is called, which I believed brings [-pi/4, 3pi/4] in [0, pi/2], but but that is not the case. I didn't expected that an additional function like regularize_rboxes has to been called.....

Petros626 avatar May 07 '25 07:05 Petros626

@Petros626 You've identified a key point in the workflow! The function xyxyxyxy2xywhr converts format but doesn't normalize angles - it simply preserves the angle values during the conversion.

Looking at the xyxyxyxy2xywhr implementation in utils/ops.py, it uses cv2.minAreaRect() which returns angles in the range [-90°, 0°] and then converts to radians. This explains why you're seeing negative angles in your raw outputs.

The final normalization to [0, π/2] happens explicitly through regularize_rboxes, which is called automatically in some visualization and export paths but not when directly accessing the raw prediction properties. This separation of concerns allows more flexibility in the implementation.

If you want consistent [0, π/2] angles when working directly with the prediction properties, you should explicitly call:

from ultralytics.utils.ops import regularize_rboxes
normalized_boxes = regularize_rboxes(your_boxes_tensor)  # where your_boxes_tensor is in xywhr format

This will handle both the angle normalization and any necessary width/height swapping when angles are outside the preferred range.

glenn-jocher avatar May 07 '25 14:05 glenn-jocher

hold on with OpenCV >= 4.5.1 (I got 4.11) I expected [0,90°].

Petros626 avatar May 07 '25 15:05 Petros626

@Petros626 You're correct to be confused about this behavior. Let me clarify what's happening:

With OpenCV 4.5.1+, cv2.minAreaRect() returns angles in the range [-90°, 0°], and this is what's used inside xyxyxyxy2xywhr. The function doesn't normalize these angles to [0, 90°] - it just converts them to radians.

The complete workflow is:

  1. Model predicts angles in range [-π/4, 3π/4]
  2. When converting between formats with xyxyxyxy2xywhr, it uses cv2.minAreaRect() which returns angles in [-90°, 0°]
  3. The normalization to [0, π/2] happens separately through regularize_rboxes

This separation allows flexibility, but means you need to explicitly call regularize_rboxes when working with raw angles:

from ultralytics.utils.ops import regularize_rboxes
normalized_boxes = regularize_rboxes(boxes)  # boxes in xywhr format

There's no automatic angle normalization when accessing raw prediction properties, which explains why you're seeing negative angles.

glenn-jocher avatar May 08 '25 00:05 glenn-jocher

you should refer to this: Image

Image

Honestly I expected from the Ultralytics framework using an OpenCV version 4.5.1+, that cv2.minAreRect() returns angles in the range [0°, 90°] and NOT [-90°, 0]. For me this is a bug and misleading the user.

Petros626 avatar May 08 '25 11:05 Petros626

@Y-T-G I come back to this statement: "Height is just what's vertical and width is horizontal". When I post-process the prediction results should I assign then just:

wrote initially (posted first):

cx => center_x, 
cy => center_y,
w (vert.) => length,
h (hori.) => width,
angle (rad) => rotation, 
conf => score


updated (corrected):

cx => center_x, 
cy => center_y,
w (hori.) => length,
h (vert.) => width,
angle (rad) => rotation, 
conf => score

updated: w=horizontal, h=vertical

At the end I want to assign these values for objects: Image

Petros626 avatar May 30 '25 09:05 Petros626

Width is horizontal (along x-axis). Height is vertical (along y-axis).

Y-T-G avatar May 30 '25 10:05 Y-T-G

So assignments are correct?

Petros626 avatar May 30 '25 11:05 Petros626

Except width and height, the rest are correct

Y-T-G avatar May 30 '25 11:05 Y-T-G

@Y-T-G What is the correct assignment? After your description this would be assignment or not?

Or I have to use θ_oc as reference, when w, h corresponds length, width?

Petros626 avatar May 30 '25 17:05 Petros626

Width is horizontal (along x-axis). Height is vertical (along y-axis).

This is the correct one. You're using the opposite for width and height.

Y-T-G avatar May 31 '25 07:05 Y-T-G

@Y-T-G the description inside the brackets was wrong? Maybe you speak about image room and I about object geometry.

I don't get it, can you show the solution please?

Petros626 avatar May 31 '25 09:05 Petros626

The one you edited is correct. The one you wrote initially was incorrect.

Y-T-G avatar May 31 '25 10:05 Y-T-G