Confusion about width, height, rotation of saved prediction output (x,y,w,h,r)
Search before asking
- [x] I have searched the Ultralytics YOLO issues and discussions and found no similar questions.
Question
Here https://github.com/ultralytics/ultralytics/issues/20441 I exported the predictions in my desired format: x,y,w,h,r and compared it with the GT's. What I observe is that y is not precise and length and width seems switched. Also the rotation outputs ([-pi/4, 3pi/4]) doesn't match.
From my understanding:
YOLO width = length for e.g. 3D BBox
YOLO height = width "" ""
Car (GT: 1, Pred: 1):
Match 1 (distance: 1.41m):
Ground Truth:
type: Car
location (x,y,z): (-5.58, 1.77, 35.37)
dimensions (h,w,l): (1.45, 1.62, 3.83)
rotation_y: 1.52rad (87.1°)
Nearest Prediction:
type: Car
location (x,y,z): (-5.60, 0.36, 35.43)
dimensions (h,w,l): (0.00, 4.35, 2.11)
rotation_y: -1.59rad (-91.1°)
Can you tell me, if w and h must be switched everytime or why the predictions doesn't seem right. Additionally why y-coord. is so different.
Thank you so much!
Additional
Predicted labels: 1 263.897 282.975 21.0666 43.4668 0.0148887 0.881846 2 331.346 253.133 8.82084 9.56901 -0.00208643 0.00446377 2 370.42 207.72 10.1142 10.5897 0.0349597 0.00217024 2 203.962 431.016 9.02832 10.6538 0.00160701 0.00170111
Predicted labels in camera frame: Car 0.0 0 0.0 0.0 0.00 4.35 2.11 -5.60 0.36 35.43 -1.59 0.881846 Pedestrian 0.0 0 0.0 0.0 0.00 0.96 0.88 1.14 0.32 38.41 -1.57 0.00446377 Pedestrian 0.0 0 0.0 0.0 0.00 1.06 1.01 5.05 0.32 42.95 -1.61 0.00217024 Pedestrian 0.0 0 0.0 0.0 0.00 1.07 0.90 -11.60 0.27 20.63 -1.57 0.00170111
GT's (camera frame: Car 0.0 0 1.68 32.68 1.45 1.62 3.83 -5.58 1.77 35.37 1.52 -1.0 Pedestrian 0.0 0 1.13 30.63 1.74 0.47 0.55 5.15 1.51 42.82 1.24 -1.0
👋 Hello @Petros626, thank you for reaching out and providing detailed context! 🚀 This is an automated response to help you get started—an Ultralytics engineer will be with you soon to assist further.
We recommend checking our Docs for guidance on output formats and predictions, including Python and CLI usage. Many common questions are addressed there.
If you believe this is a 🐛 bug, please provide a minimum reproducible example (MRE) so we can investigate efficiently.
For custom training or prediction output questions, include as much detail as possible:
- Dataset image examples
- Full prediction and ground truth samples
- Your export and evaluation code
- Training logs or settings used
Make sure you are following our Tips for Best Training Results.
Join the Ultralytics community for further discussion or troubleshooting:
Upgrade
Before proceeding, please ensure that you are running the latest ultralytics package and all requirements in a Python>=3.8 environment with PyTorch>=1.8:
pip install -U ultralytics
Environments
YOLO is verified in the following up-to-date environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
-
Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
-
Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all Ultralytics CI tests are currently passing. Our CI checks verify correct operation of all YOLO Modes and Tasks every 24 hours and on each commit across macOS, Windows, and Ubuntu.
Thank you for your patience and for helping us improve Ultralytics!
Seems like a bug with your modification because Ultralytics only returns angles between 0-90 degrees and it follows the OpenCV definition. If you modified it to follow a different definition, then you would need to enforce that accordingly.
I saved the predictions with save_txt and save_conf and used this line here
line = (c, *(d.xywhr.view(-1) if is_obb else d.xyxyxyxyn.view(-1)))
The post-processing for val mode in head.py is still the default one (-45° -135°)
angle = (angle.sigmoid() - 0.25) * math.pi
-
When you look to
Predicted labelsthewidthis always smaller thanheight, but shouldn't thewidthespecially forCarbe always greater? How is the ultralytics yolo convention? -
The wrong rotation is a miracle for me, bc I didn't changed anything in the source code.
Ultralytics uses OpenCV definition. The width and height have no relationship. Either of them may be greater than the other. Height is just what's vertical and width is horizontal. And the angle is the clockwise rotation with respect to positive x-axis.
https://github.com/ultralytics/ultralytics/issues/19642#issuecomment-2714931060
For learning the obb you use the opencv convention, but for the validation (model.val()) the output of sigmoid [-45°,135°] without postprocessing [0°,90°], only for the mode predicted the opencv convention applies again.
According to your link, the angle should therefore be interpreted differently from the gt rotation according to the opencv convention. This means that the raw rotational output fits on my Gt labels, even if they are different.
The confusion is understandable. Let me clarify the OBB (Oriented Bounding Box) conventions in Ultralytics:
-
Regarding width and height: Ultralytics follows OpenCV's convention where width and height are defined by the oriented box itself, not object semantics. This means width and height have no inherent relationship with the object's actual dimensions (like a car being longer than wide). The width and height in the model's output correspond to the oriented rectangle's dimensions.
-
For rotation angles: During training/validation, the raw network output range is indeed [-π/4, 3π/4] as you noted from
angle = (angle.sigmoid() - 0.25) * math.pi. However, when storing results or converting to other formats (using functions likexywhr2xyxyxyxyfromultralytics.utils.ops), these angles get regularized to the [0, π/2] range. -
The y-coordinate discrepancy is probably related to coordinate system differences between your 3D world space and the 2D image space where YOLO operates.
When comparing with your ground truth, you'll need to account for these conventions and potentially transform between coordinate systems properly. You might need to apply a mapping between your ground truth angles and the OpenCV convention angles.
To answer 1: Yes this is correct.
To answer 2: In my case xyxyxyxy2xywhr is used to save predictions (wanted preds as xywhr format), probably related to the changed line (https://github.com/ultralytics/ultralytics/issues/20472#issuecomment-2848676988).
So finally I understand the Ultralytics (OpenCV) convention as @Y-T-G mentioned uses clockwise rotation:
0-------------------> x (0 rad)
| A-------------B
| | |
| | box h
| | angle=0 |
| D------w------C
v
y (pi/2 rad)
Ultralytics uses OpenCV definition. The width and height have no relationship. Either of them may be greater than the other. Height is just what's vertical and width is horizontal. And the angle is the clockwise rotation with respect to positive x-axis.
when CW rotation from 0°->90° is the r prediction output of the model, why some rotation angles are negative (e.g. -29°)?
This returns negative values angle_rad = d.xywhr.view(-1)[4].item()
Probably a badly trained model
does the OpenCV definition in xyxyxyxyxy2xywhr not ensure consistency?
Workaround valid?:
def normalize_angle_pred(angle):
"""Normalize angle to [0, pi/2] range."""
while angle > pi/2:
angle -= pi/2
while angle < 0:
angle += pi/2
return angle
You can use regularize_rboxes if you want to force them to be between 0-90
https://github.com/ultralytics/ultralytics/blob/ac73bc4c36c4e210ab619d695eb97335c5380697/ultralytics/utils/ops.py#L805
I would like to understand the origin of the problem.
@Petros626 The issue is in the different stages of angle handling in the OBB implementation:
-
During model forward pass, angles are produced in the range [-π/4, 3π/4] as seen in
head.py:angle = (angle.sigmoid() - 0.25) * math.pi # [-pi/4, 3pi/4] -
When using
d.xywhrdirectly to access raw prediction values, you're getting these unregularized angles, which is why you see negative values. -
Normally, these angles get regularized to [0, π/2] through
regularize_rboxesduring post-processing, which happens when results are saved or displayed. This function handles angle normalization by:t = t % (math.pi / 2) # regularized boxes
The correct solution is to use regularize_rboxes from ultralytics.utils.ops on your results rather than creating a custom normalization function. This ensures consistency with Ultralytics' implementation and handles edge cases like aspect ratio swapping when angles exceed π/2.
from ultralytics.utils.ops import regularize_rboxes
regularized_boxes = regularize_rboxes(raw_boxes) # raw_boxes in xywhr format
after the forward pass in head.py the xyxyxyxy2xywhr is called, which I believed brings [-pi/4, 3pi/4] in [0, pi/2], but but that is not the case. I didn't expected that an additional function like regularize_rboxes has to been called.....
@Petros626 You've identified a key point in the workflow! The function xyxyxyxy2xywhr converts format but doesn't normalize angles - it simply preserves the angle values during the conversion.
Looking at the xyxyxyxy2xywhr implementation in utils/ops.py, it uses cv2.minAreaRect() which returns angles in the range [-90°, 0°] and then converts to radians. This explains why you're seeing negative angles in your raw outputs.
The final normalization to [0, π/2] happens explicitly through regularize_rboxes, which is called automatically in some visualization and export paths but not when directly accessing the raw prediction properties. This separation of concerns allows more flexibility in the implementation.
If you want consistent [0, π/2] angles when working directly with the prediction properties, you should explicitly call:
from ultralytics.utils.ops import regularize_rboxes
normalized_boxes = regularize_rboxes(your_boxes_tensor) # where your_boxes_tensor is in xywhr format
This will handle both the angle normalization and any necessary width/height swapping when angles are outside the preferred range.
hold on with OpenCV >= 4.5.1 (I got 4.11) I expected [0,90°].
@Petros626 You're correct to be confused about this behavior. Let me clarify what's happening:
With OpenCV 4.5.1+, cv2.minAreaRect() returns angles in the range [-90°, 0°], and this is what's used inside xyxyxyxy2xywhr. The function doesn't normalize these angles to [0, 90°] - it just converts them to radians.
The complete workflow is:
- Model predicts angles in range [-π/4, 3π/4]
- When converting between formats with
xyxyxyxy2xywhr, it usescv2.minAreaRect()which returns angles in [-90°, 0°] - The normalization to [0, π/2] happens separately through
regularize_rboxes
This separation allows flexibility, but means you need to explicitly call regularize_rboxes when working with raw angles:
from ultralytics.utils.ops import regularize_rboxes
normalized_boxes = regularize_rboxes(boxes) # boxes in xywhr format
There's no automatic angle normalization when accessing raw prediction properties, which explains why you're seeing negative angles.
you should refer to this:
Honestly I expected from the Ultralytics framework using an OpenCV version 4.5.1+, that cv2.minAreRect() returns angles in the range [0°, 90°] and NOT [-90°, 0]. For me this is a bug and misleading the user.
@Y-T-G I come back to this statement: "Height is just what's vertical and width is horizontal". When I post-process the prediction results should I assign then just:
wrote initially (posted first):
cx => center_x,
cy => center_y,
w (vert.) => length,
h (hori.) => width,
angle (rad) => rotation,
conf => score
updated (corrected):
cx => center_x,
cy => center_y,
w (hori.) => length,
h (vert.) => width,
angle (rad) => rotation,
conf => score
updated: w=horizontal, h=vertical
At the end I want to assign these values for objects:
Width is horizontal (along x-axis). Height is vertical (along y-axis).
So assignments are correct?
Except width and height, the rest are correct
@Y-T-G What is the correct assignment? After your description this would be assignment or not?
Or I have to use θ_oc as reference, when w, h corresponds length, width?
Width is horizontal (along x-axis). Height is vertical (along y-axis).
This is the correct one. You're using the opposite for width and height.
@Y-T-G the description inside the brackets was wrong? Maybe you speak about image room and I about object geometry.
I don't get it, can you show the solution please?
The one you edited is correct. The one you wrote initially was incorrect.