MONAI `apply_affine_to_boxes` does not handle flipping properly, due to the left-closed and right-open nature of the boxes

trafficstars

Describe the bug apply_affine_to_boxes does not handle flipping properly, due to the left-closed and right-open nature of the boxes. That is to say, when a box is flipped, the left-closed and right-open coordinates become left-open and right closed, which should be maintained.

To be clarified, I understand that there's a flip_boxes function available, but I want to use this example for simplicity. Same issue exists for more complicated affine matrix, as long as containing flipping.

To Reproduce

import torch

from monai.apps.detection.transforms.box_ops import apply_affine_to_boxes
from monai.data import MetaTensor
import monai.transforms as mt

def main():
    for flip_axis in range(3):
        flip = mt.Flip(spatial_axis=flip_axis)
        x = torch.zeros(1, 1, 1, 1)
        flipped: MetaTensor = flip(x)
        box = torch.tensor([[0, 0, 0, 1, 1, 1]])
        box_flipped = apply_affine_to_boxes(box, flipped.affine.inverse())
        print(box_flipped)

if __name__ == '__main__':
    main()

Expected behavior Produce the same results as flip_boxes.

Actual Results

tensor([[-1,  0,  0,  0,  1,  1]])
tensor([[ 0, -1,  0,  1,  0,  1]])
tensor([[ 0,  0, -1,  1,  1,  0]])

Suggestion My suggestion for the fix is to convert the box coordinates to be closed on the both sides before applying the affine matrix, and convert them back to the desired format after applying the affine.

Additional Context

If I understand correctly, the left-closed and right open nature is suggested here: https://github.com/Project-MONAI/MONAI/blob/e1a69b03c86ce065db2816b696ea4a6b57d46435/monai/data/box_utils.py#L40-L45

Feature Request

This is not about the major topic of this issue, but by the way, I would like to mention that in the code above I calculate the inverse of the affine matrix. Currently, I only find this way works to apply arbitrary affine transform to boxes. It would be nice if apply_affine_to_boxes could do it for be internally by solving a linear equation for numerical stability.

May 04 '24 17:05 function2-llx

Hi @function2-llx, I'm not exactly sure what outcome you're expecting. It appears the result is correct as the six digits represent coordinates in the XYZXYZ format.

Thanks.

May 06 '24 07:05 KumoLiu

@KumoLiu Hello, the format is indeed correct, however, the values of coordinates are mismatched with flip_boxes. Let me do it again with this function:

import torch

from monai.apps.detection.transforms.box_ops import apply_affine_to_boxes, flip_boxes
from monai.data import MetaTensor
import monai.transforms as mt

def main():
    for flip_axis in range(3):
        flip = mt.Flip(spatial_axis=flip_axis)
        x = torch.zeros(1, 1, 1, 1)
        flipped: MetaTensor = flip(x)
        box = torch.tensor([[0, 0, 0, 1, 1, 1]])
        box_flipped = apply_affine_to_boxes(box, flipped.affine.inverse())
        print('wrong:', box_flipped)
        box_flipped_correct = flip_boxes(box, x.shape[1:], flip_axis)
        print('correct:', box_flipped_correct)

if __name__ == '__main__':
    main()

The results are:

wrong: tensor([[-1,  0,  0,  0,  1,  1]])
correct: tensor([[0, 0, 0, 1, 1, 1]])
wrong: tensor([[ 0, -1,  0,  1,  0,  1]])
correct: tensor([[0, 0, 0, 1, 1, 1]])
wrong: tensor([[ 0,  0, -1,  1,  1,  0]])
correct: tensor([[0, 0, 0, 1, 1, 1]])

May 06 '24 07:05 function2-llx

Hi @function2-llx, the difference you're seeing arises from the second argument you're passing into the flip_boxes function, which is the spatial_size of the image. The spatial_size parameter dictates the axis along which you intend to flip. For more details, please take a look at the code here. And if you set it to [0, 0, 0], you will get the same result with the apply_affine_to_boxes.

This comment in the discussion about the issue might also provide some helpful context.

Thanks.

May 06 '24 08:05 KumoLiu

The spatial_size parameter dictates the axis along which you intend to flip.

I'm sorry if I misunderstand anything, but isn't the spatial_size parameter literally indicating the size of the image that the boxes are on?

https://github.com/Project-MONAI/MONAI/blob/e1a69b03c86ce065db2816b696ea4a6b57d46435/monai/apps/detection/transforms/box_ops.py#L161-L169

May 06 '24 08:05 function2-llx

Hi @function2-llx,

but isn't the spatial_size parameter literally indicating the size of the image that the boxes are on?

Yes, for this flip_boxes and for detection. However, boxes can only be boxes; geometric data can be without reference. So the flip_boxes here is only used to detection application.

May 06 '24 09:05 KumoLiu

MONAI MONAI copied to clipboard

`apply_affine_to_boxes` does not handle flipping properly, due to the left-closed and right-open nature of the boxes

MONAI
MONAI copied to clipboard