keras-yolo3 icon indicating copy to clipboard operation
keras-yolo3 copied to clipboard

Non-Max-Suppression is extremely SLOW up to 5-7 seconds !

Open thusinh1969 opened this issue 5 years ago • 8 comments

I use the yolo3_one_file_to_detect_them_all.py for a 608x608 image. It was so slow that I have to time the entire your prediction process. It turned out that that do_nms takes 5-7 seconds for an image that has 10 objects (person only).

I am using Titan X on Ubuntu 16.4. All other models prediction are 30-35 fps. Any hint please.

Thank you. Steve

thusinh1969 avatar Mar 21 '19 07:03 thusinh1969

Same problem. There are 10000 boxes generated and do_nms does for-loops over them instead of doing vectorized operations. I found this tutorial and will try to integrate their implementation into do_nms.

alpotapov avatar Aug 28 '19 16:08 alpotapov

decode_netout doesn't filter boxes by threshold Try replacing if(objectness.all() <= obj_thresh): continue with if (objectness <= obj_thresh).all(): continue (line 302)

AlexM4 avatar Sep 19 '19 10:09 AlexM4

@AlexM4 That is a MASSIVE improvement!

FlorinAndrei avatar Oct 02 '19 05:10 FlorinAndrei

@alpotapov Have you found a better / faster version of do_nms() ?

FlorinAndrei avatar Oct 02 '19 09:10 FlorinAndrei

@alpotapov Have you found a better / faster version of do_nms() ?

I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.

firefly2442 avatar May 21 '20 18:05 firefly2442

@alpotapov Have you found a better / faster version of do_nms() ?

I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.

Can you share the code how you used tf.image.non_max_suppression

AiTeamVusolutions avatar Aug 16 '20 13:08 AiTeamVusolutions

@alpotapov Have you found a better / faster version of do_nms() ?

I used tf.image.non_max_suppression from Tensorflow 2. This was fairly speedy.

Can you share the code how you used tf.image.non_max_suppression

I believe this is it. I've since swapped to using the yolov3-tf2 codebase which implements most of the items in Tensorflow. The raw Python I was using before wasn't quite as speedy.

firefly2442 avatar Aug 16 '20 15:08 firefly2442

I tried to make a custom do_nms function and gained a bit of time improvement, maybe this will be helpful for someone. I am very new to the whole ML scene, and programming in general so please feel free to correct me if there are any issues, or any improvements I can make.

def do_nms(boxes, scores, threshold):
    selected_indices = tf.image.non_max_suppression(
        boxes, scores, 10, threshold)
    selected_boxes = tf.gather(boxes, selected_indices)

    return selected_boxes.numpy().astype(int)
while True:
...
...
...
...

    # get the details of the detected objects
    v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)

    coords = np.empty([len(v_boxes), 4])
    for i in range(len(v_boxes)):
        coords[i] = [v_boxes[i].ymin, v_boxes[i].xmin,
                     v_boxes[i].ymax, v_boxes[i].xmax]

    s_boxes = do_nms(coords, v_scores, 0.5)

    num_preds = print(len(s_boxes))

    # summarize what we found
    for i in range(len(s_boxes)):
        print(v_labels[i], v_scores[i])

zeeshanbasar avatar Jun 28 '21 08:06 zeeshanbasar