keras-cv icon indicating copy to clipboard operation
keras-cv copied to clipboard

bounding box conversions fail with images of different sizes

Open martin-gorner opened this issue 2 years ago • 16 comments

internally b/242193874

repro code:

images = tf.ragged.constant([np.ones(shape=[20,30,3]), # 2 images
                             np.ones(shape=[25,15,3])], ragged_rank=2)
print("Images shape:", images.shape)
boxes = tf.ragged.constant([[[3,5,15,25], [10,2,12,4], [1,9,7,19]], # 3 boxes
                            [[1,10,24,14], [5,6,14,16]]], ragged_rank=1) # 2 boxes
print("Boxes shape:", boxes.shape)
# boxes originally in "xyxy" format, converting to "rel_xyxy"
kcv.bounding_box.convert_format(boxes, source="xywh", target="rel_xyxy", images=images)

output is :

Images shape: (2, None, None, 3)
Boxes shape: (2, None, 4)

but the bbox conversion fails with error:

ValueError: Index 1 is not uniform

Looking at the code, a single width and a single height are extracted from the parameter "images" which is wrong. Multiple images can have multiple sizes:

def _xyxy_to_rel_xyxy(boxes, images=None):
    if images is None:
        raise RequiresImagesException()
    shape = tf.shape(images)
==> height, width = shape[1], shape[2]
    height, width = tf.cast(height, boxes.dtype), tf.cast(width, boxes.dtype)
    left, top, right, bottom, rest = tf.split(
        boxes, [1, 1, 1, 1, boxes.shape[-1] - 4], axis=-1
    )
    left, right = left / width, right / width
    top, bottom = top / height, bottom / height
    return tf.concat(
        [left, top, right, bottom, rest],
        axis=-1,
    )

martin-gorner avatar Sep 26 '22 12:09 martin-gorner

check https://github.com/keras-team/keras-cv/pull/848

bhack avatar Sep 26 '22 23:09 bhack

@martin-gorner What do you need here?

bhack avatar Oct 05 '22 11:10 bhack

Please check with Ragged bounding boxes! The repro code on top of this issue still fails. Also, please add ragged bounding boxes to unit tests.

Note: The failure is on multiplications:

def _rel_xyxy_to_xyxy(boxes, images=None, image_shape=None):
...     
    left, right = left*image_width, right*image_width
...

I have been able to get it to work with this, but it was trial and error, not actual understanding of what I am doing🤪. Please check!

def _rel_xyxy_to_xyxy(boxes, images=None, image_shape=None):
... 
    def mul_dims(box_dim, image_dim):
        box_dim = tf.squeeze(box_dim, axis=-1)
        box_dim = box_dim * image_dim
        box_dim = tf.expand_dims(box_dim, axis=-1)
        return box_dim
    
    left, right = mul_dims(left,image_width), mul_dims(right,image_width)
    top, bottom = mul_dims(top, image_height), mul_dims(bottom, image_height)
... 

martin-gorner avatar Oct 05 '22 11:10 martin-gorner

Thanks, I will try to take a look at this case

bhack avatar Oct 05 '22 11:10 bhack

Also, please add ragged bounding boxes to unit tests.

We had already this test from the internal team. I've just used the same for ragged_images:

https://github.com/keras-team/keras-cv/blob/f77d0cb6c0e221c2976555f1387e2aad54cc0257/keras_cv/bounding_box/converters_test.py#L158-L167

And it was passing.

The problem instead is that the testing input it was already not covering the uses cases we want. As you can see it supposes just 1 bounding box x image:

https://github.com/keras-team/keras-cv/blob/f77d0cb6c0e221c2976555f1387e2aad54cc0257/keras_cv/bounding_box/converters_test.py#L23-L24

bhack avatar Oct 05 '22 12:10 bhack

Yes, I noticed.

martin-gorner avatar Oct 05 '22 12:10 martin-gorner

Can you express your input case with a tf.constant?

As in the original test design the same input was used for non ragged tests and for the ragged is transformed on the fly by _raggify:

https://github.com/keras-team/keras-cv/blob/f77d0cb6c0e221c2976555f1387e2aad54cc0257/keras_cv/bounding_box/converters_test.py#L181-L182

bhack avatar Oct 05 '22 14:10 bhack

Mhh we cannot do that for your case as it is valid only as a ragged tensor.

Waiting for @LukeWood about how he would refactor this test without breaking too much things related to the original design.

bhack avatar Oct 05 '22 14:10 bhack

Mhh we cannot do that for your case as it is valid only as a ragged tensor.

Waiting for @LukeWood about how he would refactor this test without breaking too much things related to the original design.

I see.

If our test structure cannot reproduce a legitimate failure, then we should definitely refactor the tests.

LukeWood avatar Oct 05 '22 14:10 LukeWood

Can you clarify the original _raggify?

https://github.com/keras-team/keras-cv/pull/848/files#diff-0424266252dea39b6c0abc32eca23cffb6a33fd12a1efab4f682dd509508ea4bL130

What input format do we need to support?

bhack avatar Oct 05 '22 16:10 bhack

With the original _raggify with the standard test input:

xyxy_box = tf.constant([[[10, 20, 110, 120], [20, 30, 120, 130]]], dtype=tf.float32)
def _raggify(tensor, row_lengths=[[2, 0], [0, 0]]):
    return tf.RaggedTensor.from_row_lengths(tensor[0], [2, 0])
<tf.RaggedTensor [[[10.0, 20.0, 110.0, 120.0],
  [20.0, 30.0, 120.0, 130.0]], []]>

Please note []. Also, why we have a row_lengths arg if it is not used in the implementation?

bhack avatar Oct 05 '22 16:10 bhack

What input format do we need to support? For bounding boxes, the norm is to have a variable number of bounding boxes for each image. The norm is also to process images in batches. So the following is a pretty normal batch of bounding boxes:

# batch size = 2
boxes = tf.ragged.constant([[[3,5,15,25], [10,2,12,4], [1,9,7,19]], # 3 boxes for first image
                            [[1,10,24,14], [5,6,14,16]]], ragged_rank=1) # 2 boxes for second image

martin-gorner avatar Oct 06 '22 07:10 martin-gorner

So what was the scope of the original _raggify [2, 0]?

Was it to simulate an empty result?

bhack avatar Oct 06 '22 07:10 bhack

Asking to @atuleu as he implemented the initial version of _raggify.

bhack avatar Oct 06 '22 16:10 bhack

So what was the scope of the original _raggify [2, 0]?

Was it to simulate an empty result?

I believe this was in fact to simulate an empty result.

We should rely on tf.ragged.constant in our unit tests to get the widest array of coverage.

In hindsight, I think using these helper methods was likely a design mistake.

LukeWood avatar Oct 07 '22 10:10 LukeWood

I believe this was in fact to simulate an empty result.

Ok, but it still seems to me incorrect if you see the original input with the original result I've posted at https://github.com/keras-team/keras-cv/issues/841#issuecomment-1268682255

Can you confirm what inputs do we want to test?

I make some examples:

images = tf.ragged.constant([np.ones(shape=[100,100,3]), # 2 images
                             np.ones(shape=[50,50,3])], ragged_rank=2)

boxes_3_2 = tf.ragged.constant([[[3,5,15,25], [10,2,12,4], [1,9,7,19]], # 3 boxes
                            [[1,10,24,14], [5,6,14,16]]], ragged_rank=1) # 2 boxes

boxes_1_1 = tf.ragged.constant([[10, 20, 110, 120], [[20, 30, 120, 130]]], ragged_rank=1)

# what else?

bhack avatar Oct 07 '22 14:10 bhack

@martin-gorner Are you still interested in this?

There is a connected PR/colab.

bhack avatar Nov 07 '22 23:11 bhack

Absolutely still interested. Thank you for your contributions @bhack ! I'll have the team look at your PR asap.

martin-gorner avatar Nov 10 '22 13:11 martin-gorner

I tested. The fix looks good to me. Thanks!

martin-gorner avatar Nov 28 '22 14:11 martin-gorner

Do I close my PR or are you interested?

bhack avatar Nov 28 '22 18:11 bhack