keras-cv
keras-cv copied to clipboard
RetinaNet predict() is too slow
After upgrading keras-cv
to v0.6.1 I noticed that predict method of RetinaNet model became really slow comparing with v0.5.1 as result it complicates COCO metrics evaluation.
inputs = [SINGLE_IMAGE]
model = keras_cv.models.RetinaNet.from_preset(...)
model.predict(inputs)
When gettting predictions for a single image in 0.6.1: 1/1 [==============================] - 42s 42s/step
And in 0.5.1: 1/1 [==============================] - 6s 6s/step
Another problem with predictions is that this function throws an exception when passing a generator as an argument. And again, in 0.5.1 it worked perfectly.
y_pred = model.predict(image_generator(...))
TypeError: in user code:
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2169, in predict_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2155, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2143, in run_step **
outputs = model.predict_step(data)
File "/usr/local/lib/python3.10/dist-packages/keras_cv/models/object_detection/retinanet/retinanet.py", line 248, in predict_step
return self.decode_predictions(outputs, args[-1])
File "/usr/local/lib/python3.10/dist-packages/keras_cv/models/object_detection/retinanet/retinanet.py", line 289, in decode_predictions
anchors = self.anchor_generator(image_shape=image_shape)
File "/usr/local/lib/python3.10/dist-packages/keras_cv/layers/object_detection/anchor_generator.py", line 179, in __call__
generator(image_shape),
File "/usr/local/lib/python3.10/dist-packages/keras_cv/layers/object_detection/anchor_generator.py", line 264, in __call__
0.5 * stride, math.ceil(image_width / stride) * stride, stride
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
I'd stick with the older version until the issue is resolved, but unfortuantely COCO metrics seem to be broken in 0.5.1 as was reported in the following issue https://github.com/keras-team/keras-cv/issues/1994
Colab for reproducing the issue: https://colab.research.google.com/drive/1dzJFiVIxXtJCoj-ShjRyu-ZPkf6ClCdj?usp=sharing
Thanks for reporting the issue. keras-cv
was updated to support keras-core
along with the existing tf.keras
backend and might have had this regression. I will take a look and get back.
On the above error, you could wrap it in tf.data.DataSet.from_generator
for your generator function and provide it the proper shape of the image -
from functools import partial
def image_generator(file_path):
image_batch = preprocess_image(file_path)
for i in range(3):
yield image_batch
ds = tf.data.Dataset.from_generator(
partial(image_generator, image_path),
output_types=tf.float32,
output_shapes=(None, 640, 640, 3),
)
y_pred = model.predict(ds)
This will resolve the error.
Thanks for help, the trick with dataset worked fine.
With regards to the slow prediction. It becomes much faster after assigning a custom NMS with any thresholds:
model.prediction_decoder = keras_cv.layers.MultiClassNonMaxSuppression(
bounding_box_format="xywh",
from_logits=True,
iou_threshold=1.0,
confidence_threshold=0.0
)
However, it cannot be used for training, because COCO metrics validation callback raises an exception. I'm new to the object detection so not sure if it's expected behavior or not.
Traceback (most recent call last):
File ".../object_detection/model.py", line 570, in <module>
run_training()
File ".../object_detection/model.py", line 334, in run_training
model.fit(
File "/Users/axel/miniforge3/envs/ml/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/axel/miniforge3/envs/ml/lib/python3.11/site-packages/keras_cv/callbacks/pycoco_callback.py", line 132, in on_epoch_end
metrics = compute_pycoco_metrics(ground_truth, predictions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/axel/miniforge3/envs/ml/lib/python3.11/site-packages/keras_cv/metrics/coco/pycoco_wrapper.py", line 219, in compute_pycoco_metrics
coco_predictions = _convert_predictions_to_coco_annotations(predictions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/axel/miniforge3/envs/ml/lib/python3.11/site-packages/keras_cv/metrics/coco/pycoco_wrapper.py", line 131, in _convert_predictions_to_coco_annotations
predictions["detection_boxes"][i][j]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
I have this validation callback in my code:
keras_cv.callbacks.PyCOCOCallback(validation_data=eval_ds, bounding_box_format="xywh", cache=False)
@axelusarov is the slower prediction just for the first step (e.g. due to graph tracing), or is it also that every step is slower?
I suppose that the single-class NMS is the likely issue here in terms of performance.
I don't think this error is expected behavior -- this looks like a bug. I believe that #2030 should fix it. Sorry for the regression and thanks for the issue report!
@ianstenbit prediction is slow for each step. Here is predict()
example for 5 images before/after assigning new MultiClassNonMaxSuppression
with default parameters to the model.
(ml) % python test.py
Using TensorFlow backend
5/5 [==============================] - 121s 23s/step
(ml) % python test.py
Using TensorFlow backend
5/5 [==============================] - 2s 119ms/step
After using the code from master, validation step is much faster now when using custom NMS and I didn't notice errors this time. I didn't check training quality though.
Yeah this looks like there's some significant slowdown with the single-class NMS for TensorFlow, and it's probably just just due to graph tracing. This is something we should look into further, but I can't prioritize it right now
Was about to write a GitHub issue about this, but yes single-class NMS is much slower then mutli-class. Not just prediction, training is 10x slower. I haven't spent much time looking at the source code, graph tracing, etc., but I can take a look at this to see what is going on.
After looking into this, it seems that NonMaxSuppression calls tf.image.non_max_suppression_padded() in image_ops_impl.py. This calls non_max_suppression_padded_v2 in the same file, which proceeds to conduct NMS in pure Python. This is in contrast to MultiClassNonMaxSuppression, which calls tf.image.combined_non_max_suppression () in image_ops_impl.py. This proceeds to call gen_image_ops.combined_non_max_suppression(), which I believe runs the NMS in C++. This makes makes the 10x speed up possible. Anyone have any idea why this was done? For reference here is tf.image.non_max_suppression_padded():
@tf_export('image.non_max_suppression_padded')
@dispatch.add_dispatch_support
def non_max_suppression_padded(boxes,
scores,
max_output_size,
iou_threshold=0.5,
score_threshold=float('-inf'),
pad_to_max_output_size=False,
name=None,
sorted_input=False,
canonicalized_coordinates=False,
tile_size=512):
with ops.name_scope(name, 'non_max_suppression_padded'):
if not pad_to_max_output_size:
# pad_to_max_output_size may be set to False only when the shape of
# boxes is [num_boxes, 4], i.e., a single image. We make best effort to
# detect violations at compile time. If `boxes` does not have a static
# rank, the check allows computation to proceed.
if boxes.get_shape().rank is not None and boxes.get_shape().rank > 2:
raise ValueError("'pad_to_max_output_size' (value {}) must be True for "
'batched input'.format(pad_to_max_output_size))
if name is None:
name = ''
# idx, num_valid = non_max_suppression_padded_v2(
# boxes, scores, max_output_size, iou_threshold, score_threshold,
# sorted_input, canonicalized_coordinates, tile_size)
idx, num_valid = non_max_suppression_padded_v1(
boxes, scores, max_output_size, iou_threshold, score_threshold,
pad_to_max_output_size, name)
# def_function.function seems to lose shape information, so set it here.
if not pad_to_max_output_size:
idx = idx[0, :num_valid]
else:
batch_dims = array_ops.concat([
array_ops.shape(boxes)[:-2],
array_ops.expand_dims(max_output_size, 0)
], 0)
idx = array_ops.reshape(idx, batch_dims)
return idx, num_valid
# TODO(b/158709815): Improve performance regression due to
# def_function.function.
# @def_function.function(
# experimental_implements='non_max_suppression_padded_v2')
def non_max_suppression_padded_v2(boxes,
scores,
max_output_size,
iou_threshold=0.5,
score_threshold=float('-inf'),
sorted_input=False,
canonicalized_coordinates=False,
tile_size=512):
with ops.name_scope('sort_scores_and_boxes'):
sorted_scores_indices = sort_ops.argsort(
scores, axis=1, direction='DESCENDING')
sorted_scores = array_ops.gather(
scores, sorted_scores_indices, axis=1, batch_dims=1
)
sorted_boxes = array_ops.gather(
boxes, sorted_scores_indices, axis=1, batch_dims=1
)
return sorted_scores, sorted_boxes, sorted_scores_indices
batch_dims = array_ops.shape(boxes)[:-2]
num_boxes = array_ops.shape(boxes)[-2]
boxes = array_ops.reshape(boxes, [-1, num_boxes, 4])
scores = array_ops.reshape(scores, [-1, num_boxes])
batch_size = array_ops.shape(boxes)[0]
if score_threshold != float('-inf'):
with ops.name_scope('filter_by_score'):
score_mask = math_ops.cast(scores > score_threshold, scores.dtype)
scores *= score_mask
box_mask = array_ops.expand_dims(
math_ops.cast(score_mask, boxes.dtype), 2)
boxes *= box_mask
if not canonicalized_coordinates:
with ops.name_scope('canonicalize_coordinates'):
y_1, x_1, y_2, x_2 = array_ops.split(
value=boxes, num_or_size_splits=4, axis=2)
y_1_is_min = math_ops.reduce_all(
math_ops.less_equal(y_1[0, 0, 0], y_2[0, 0, 0]))
y_min, y_max = tf_cond.cond(
y_1_is_min, lambda: (y_1, y_2), lambda: (y_2, y_1))
x_1_is_min = math_ops.reduce_all(
math_ops.less_equal(x_1[0, 0, 0], x_2[0, 0, 0]))
x_min, x_max = tf_cond.cond(
x_1_is_min, lambda: (x_1, x_2), lambda: (x_2, x_1))
boxes = array_ops.concat([y_min, x_min, y_max, x_max], axis=2)
# TODO(@bhack): https://github.com/tensorflow/tensorflow/issues/56089
# this will be required after deprecation
#else:
# y_1, x_1, y_2, x_2 = array_ops.split(
# value=boxes, num_or_size_splits=4, axis=2)
if not sorted_input:
scores, boxes, sorted_indices = _sort_scores_and_boxes(scores, boxes)
else:
# Default value required for Autograph.
sorted_indices = array_ops.zeros_like(scores, dtype=dtypes.int32)
pad = math_ops.cast(
math_ops.ceil(
math_ops.cast(
math_ops.maximum(num_boxes, max_output_size), dtypes.float32) /
math_ops.cast(tile_size, dtypes.float32)),
dtypes.int32) * tile_size - num_boxes
boxes = array_ops.pad(
math_ops.cast(boxes, dtypes.float32), [[0, 0], [0, pad], [0, 0]])
scores = array_ops.pad(
math_ops.cast(scores, dtypes.float32), [[0, 0], [0, pad]])
num_boxes_after_padding = num_boxes + pad
num_iterations = num_boxes_after_padding // tile_size
def _loop_cond(unused_boxes, unused_threshold, output_size, idx):
return math_ops.logical_and(
math_ops.reduce_min(output_size) < max_output_size,
idx < num_iterations)
def suppression_loop_body(boxes, iou_threshold, output_size, idx):
return _suppression_loop_body(
boxes, iou_threshold, output_size, idx, tile_size)
selected_boxes, _, output_size, _ = while_loop.while_loop(
_loop_cond,
suppression_loop_body,
[
boxes, iou_threshold,
array_ops.zeros([batch_size], dtypes.int32),
constant_op.constant(0)
],
shape_invariants=[
tensor_shape.TensorShape([None, None, 4]),
tensor_shape.TensorShape([]),
tensor_shape.TensorShape([None]),
tensor_shape.TensorShape([]),
],
)
num_valid = math_ops.minimum(output_size, max_output_size)
idx = num_boxes_after_padding - math_ops.cast(
nn_ops.top_k(
math_ops.cast(math_ops.reduce_any(
selected_boxes > 0, [2]), dtypes.int32) *
array_ops.expand_dims(
math_ops.range(num_boxes_after_padding, 0, -1), 0),
max_output_size)[0], dtypes.int32)
idx = math_ops.minimum(idx, num_boxes - 1)
if not sorted_input:
index_offsets = math_ops.range(batch_size) * num_boxes
gather_idx = array_ops.reshape(
idx + array_ops.expand_dims(index_offsets, 1), [-1])
idx = array_ops.reshape(
array_ops.gather(array_ops.reshape(sorted_indices, [-1]),
gather_idx),
[batch_size, -1])
invalid_index = array_ops.zeros([batch_size, max_output_size],
dtype=dtypes.int32)
idx_index = array_ops.expand_dims(math_ops.range(max_output_size), 0)
num_valid_expanded = array_ops.expand_dims(num_valid, 1)
idx = array_ops.where(idx_index < num_valid_expanded,
idx, invalid_index)
num_valid = array_ops.reshape(num_valid, batch_dims)
return idx, num_valid
Note that non-max suppression is done totally manually. Interestingly enough, non_max_suppression_v1 did refer to a C++ implementation, only v2 does it in Python. And here is tf.image.combined_non_max_suppression():
@tf_export('image.combined_non_max_suppression')
@dispatch.add_dispatch_support
def combined_non_max_suppression(boxes,
scores,
max_output_size_per_class,
max_total_size,
iou_threshold=0.5,
score_threshold=float('-inf'),
pad_per_class=False,
clip_boxes=True,
name=None):
with ops.name_scope(name, 'combined_non_max_suppression'):
iou_threshold = ops.convert_to_tensor(
iou_threshold, dtype=dtypes.float32, name='iou_threshold')
score_threshold = ops.convert_to_tensor(
score_threshold, dtype=dtypes.float32, name='score_threshold')
# Convert `max_total_size` to tensor *without* setting the `dtype` param.
# This allows us to catch `int32` overflow case with `max_total_size`
# whose expected dtype is `int32` by the op registration. Any number within
# `int32` will get converted to `int32` tensor. Anything larger will get
# converted to `int64`. Passing in `int64` for `max_total_size` to the op
# will throw dtype mismatch exception.
# TODO(b/173251596): Once there is a more general solution to warn against
# int overflow conversions, revisit this check.
max_total_size = ops.convert_to_tensor(max_total_size)
return gen_image_ops.combined_non_max_suppression(
boxes, scores, max_output_size_per_class, max_total_size, iou_threshold,
score_threshold, pad_per_class, clip_boxes)
Anyone know why regular non max suppression is being done in python? This issue seems to imply that the solution will have to come from updating tensorflow, not keras-cv, let me know if I am wrong on this.
And that is still the issue. :)