io
io copied to clipboard
`tfio.experimental.image.draw_bounding_boxes` has inconsistent shape constraints
DrawBoundingBoxesV3Op
can essentially only draw one text output per image
I think this issue got first mentioned in https://github.com/tensorflow/io/issues/1088.
However, it got labelled as an enhancement, though it looks more like an imposed limitation in the implementation as it currently is.
A simple example that works is having a single box inside a single image, with a single color code and a single text output:
import tensorflow as tf
import tensorflow_io as tfio
width = 560
height = 320
channels = 3
images = tf.random.uniform((height, width, channels), dtype=tf.float32)
images = tf.expand_dims(images, axis=0)
boxes = tf.constant([[[0.1, 0.2, 0.5, 0.9]]], dtype=tf.float32)
texts = tf.constant(["hello_world!"], dtype=tf.string)
colors = tf.constant([[255, 0, 0]], dtype=tf.float32)
print("Shapes of inputs:")
print("Images:", images.shape)
print("Boxes:", boxes.shape)
print("Texts:", texts.shape)
print("Colors:", colors.shape)
output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
print("Output:", output.shape)
Shapes of inputs:
Images: (1, 320, 560, 3)
Boxes: (1, 1, 4)
Texts: (1,)
Colors: (1, 3)
Output: (1, 320, 560, 3)
We already know from tensorflow/io/tensorflow_io/core/kernels/image_font_kernels.cc
that there is no point in trying without a batch dimension, as there is a check for the image rank to be 4:
OP_REQUIRES(context, images.dims() == 4,
errors::InvalidArgument("The rank of the images should be 4"));
This is also what the https://github.com/tensorflow/io/pull/254 PR by @yongtang that added this feature demonstrates as well
It's also spiritually the same as the one test available in the code at tensorflow/io/tests/test_image.py
Now, still within a batch size of 1 (one image), we could have more boxes, each with their own text labels and colors, but this does not work:
import tensorflow as tf
import tensorflow_io as tfio
width = 560
height = 320
channels = 3
images = tf.random.uniform((height, width, channels), dtype=tf.float32)
images = tf.expand_dims(images, axis=0)
boxes = tf.constant([[[0.1, 0.2, 0.5, 0.9], [0.3, 0.3, 0.6, 0.6]]], dtype=tf.float32)
texts = tf.constant(["hello_world!", "hello_world_part_2"], dtype=tf.string)
colors = tf.constant([[255, 0, 0], [0, 255, 0]], dtype=tf.float32)
print("Shapes of inputs:")
print("Images:", images.shape)
print("Boxes:", boxes.shape)
print("Texts:", texts.shape)
print("Colors:", colors.shape)
output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
print("Output:", output.shape)
Shapes of inputs:
Images: (1, 320, 560, 3)
Boxes: (1, 2, 4)
Texts: (2,)
Colors: (2, 3)
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
[<ipython-input-1-3eebb25ba193>](https://localhost:8080/#) in <cell line: 22>()
20 print("Colors:", colors.shape)
21
---> 22 output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
23 print("Output:", output.shape)
1 frames
<string> in io_draw_bounding_boxes_v3(images, boxes, colors, texts, font_size, name)
[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py](https://localhost:8080/#) in raise_from_not_ok_status(e, name)
5881 def raise_from_not_ok_status(e, name) -> NoReturn:
5882 e.message += (" name: " + str(name if name is not None else ""))
-> 5883 raise core._status_to_exception(e) from None # pylint: disable=protected-access
5884
5885
InvalidArgumentError: {{function_node __wrapped__IO>DrawBoundingBoxesV3_device_/job:localhost/replica:0/task:0/device:CPU:0}} The batch sizes should be the same [Op:IO>DrawBoundingBoxesV3] name:
batch sizes should be the same
refers to the batch size of images and texts, which required in tensorflow/io/tensorflow_io/core/kernels/image_font_kernels.cc
:
OP_REQUIRES(
context, images.dim_size(0) == texts_tensor.dim_size(0),
errors::InvalidArgument("The batch sizes should be the same"));
Yet, interestingly, not required for colors....
Okay, then let's try to make the shape batch size fit the image batch size, as the OP requires:
import tensorflow as tf
import tensorflow_io as tfio
width = 560
height = 320
channels = 3
images = tf.random.uniform((height, width, channels), dtype=tf.float32)
images = tf.expand_dims(images, axis=0)
boxes = tf.constant([[[0.1, 0.2, 0.5, 0.9], [0.3, 0.3, 0.6, 0.6]]], dtype=tf.float32)
texts = tf.constant(["hello_world!", "hello_world_part_2"], dtype=tf.string)
colors = tf.constant([[255, 0, 0], [0, 255, 0]], dtype=tf.float32)
# let's also expand for text
texts = tf.expand_dims(texts, axis=0)
print("Shapes of inputs:")
print("Images:", images.shape)
print("Boxes:", boxes.shape)
print("Texts:", texts.shape)
print("Colors:", colors.shape)
output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
print("Output:", output.shape)
But then we hit this error:
Shapes of inputs:
Images: (1, 320, 560, 3)
Boxes: (1, 2, 4)
Texts: (1, 2)
Colors: (2, 3)
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
[<ipython-input-2-f83aad0cb29d>](https://localhost:8080/#) in <cell line: 25>()
23 print("Colors:", colors.shape)
24
---> 25 output = tfio.experimental.image.draw_bounding_boxes(images, boxes, texts, colors)
26 print("Output:", output.shape)
1 frames
<string> in io_draw_bounding_boxes_v3(images, boxes, colors, texts, font_size, name)
[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py](https://localhost:8080/#) in raise_from_not_ok_status(e, name)
5881 def raise_from_not_ok_status(e, name) -> NoReturn:
5882 e.message += (" name: " + str(name if name is not None else ""))
-> 5883 raise core._status_to_exception(e) from None # pylint: disable=protected-access
5884
5885
InvalidArgumentError: {{function_node __wrapped__IO>DrawBoundingBoxesV3_device_/job:localhost/replica:0/task:0/device:CPU:0}} The rank of the texts tensor should be 1 [Op:IO>DrawBoundingBoxesV3] name:
The rank of the texts tensor should be 1
is required by another op: tensorflow/io/tensorflow_io/core/kernels/image_font_kernels.cc
:
OP_REQUIRES(context, texts_tensor.dims() == 1,
errors::InvalidArgument(
"The rank of the texts tensor should be 1"));
But does it work for colors only, no text? Yes
import tensorflow as tf
import tensorflow_io as tfio
width = 560
height = 320
channels = 3
images = tf.random.uniform((height, width, channels), dtype=tf.float32)
images = tf.expand_dims(images, axis=0)
boxes = tf.constant([[[0.1, 0.2, 0.5, 0.9], [0.3, 0.3, 0.6, 0.6]]], dtype=tf.float32)
colors = tf.constant([[255, 0, 0], [0, 255, 0]], dtype=tf.float32)
print("Shapes of inputs:")
print("Images:", images.shape)
print("Boxes:", boxes.shape)
print("Colors:", colors.shape)
output = tfio.experimental.image.draw_bounding_boxes(images, boxes, None, colors)
print("Output:", output.shape)
Shapes of inputs:
Images: (1, 320, 560, 3)
Boxes: (1, 2, 4)
Colors: (2, 3)
Output: (1, 320, 560, 3)
To sum up, I think this is a limitation right now, because as it does work for colors, so it should work for texts across bounding boxes. I could not spot a limitation that would force only one text display per image.
If you also agree, I would volunteer to help with a fix attempt @yongtang @terrytangyuan
Here is a link to the demo notebook with the above cells: https://colab.research.google.com/drive/1rSder84urmOGF21rtWGb7TDEu-7zq1MP?usp=sharing