realtime_object_detection Low FPS when with visualisation

The FPS that my Jetson TX2 Outputs fluctuates a lot when I set ‘visualize’ to true in the config.yml file. The FPS drop significantly (down to 7-8FPS from 33-35FPS) when there are many objects present. I was wondering if anyone else experienced this and knew where the bottleneck was occurring. I changed the object_detection.py script slightly, so that it can use a video as an Input, and the drop in frames does not occur when I set ‘visualize’ to false (in the config.yml). I understand that visualising the output will reduce the FPS, but I was not expecting it to be so drastic. It would be greatly appreciated if someone could explain why the drop is so drastic or even how to reduce the drop in frames

May 28 '18 11:05 Eoinoe

Hi, Eoinoe

Because camera uses OpenCV but visualization uses PIL. OpenCV uses numpy array. But PIL cannot use numpy array. Therefore, visualization need to data convert. This convert is too slow.

draw_bounding_box_on_image_array() function in visualization_utils.py:

image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
np.copyto(image, np.array(image_pil))

To make matters worse, these are called in visualize_boxes_and_labels_on_image_array() function by object.

performance check:

import numpy as np
import cv2 # for numpy array
import PIL.Image as Image
import time

filename = 'sample300x300.jpg'
print(filename)
image = cv2.imread(filename, cv2.IMREAD_UNCHANGED)
print(type(image))
num_loop = 30 * 20 # 30 FPS with 20 objects

start_time,start_clock = time.time(),time.clock()
for i in range(num_loop):
    image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
    np.copyto(image, np.array(image_pil))
end_time,end_clock = time.time(),time.clock()
print(type(image_pil))
print("PIL convert_from_numpy_array: {:.8f} sec".format(end_time - start_time))

output with 300x300 image size:

sample300x300.jpg
<class 'numpy.ndarray'>
<class 'PIL.Image.Image'>
PIL convert_from_numpy_array: 0.50078821 sec

output with 1280x720 image size:

sample1280x720.jpg
<class 'numpy.ndarray'>
<class 'PIL.Image.Image'>
PIL convert_from_numpy_array: 3.48462892 sec

Image conversion of 300x300 size is processed within 1 second. However, half of the processing time is used for this conversion. As image size increases, performance degradation appears clearly.

Rewrite PIL to OpenCV in visualization_utils.py, it can be sped up.

I wrote sample code for ssd_mobilenet_v1.

download this code to your object_detection/utils directory. https://github.com/naisy/realtime_object_detection/blob/master/object_detection/utils/visualization_utils_cv2.py
edit your object_detection.py before: from object_detection.utils import visualization_utils as vis_util after: from object_detection.utils import visualization_utils_cv2 as vis_util
run object_detection.py

Note:

OpenCV cannot use custom font. Therefore, the font is not beautiful. PIL: font = ImageFont.truetype('arial.ttf', 24) OpenCV: fontFace = cv2.FONT_HERSHEY_SIMPLEX
visualization_utils_cv2.py is not completed. This works only ssd_mobilenet_v1.

May 30 '18 09:05 naisy

@naisy again great work! Would you mind filing a Pull Request i would very much like to include this feature!

ANother question: Does your multi-processing show performance increase? Has it better results than multi threading?

Also: Try pulling my newest master, where the config is moved from a yml file to an own class, this should also make your code for multiprocessing much leaner, as you dont have to load each config variable again, but just pass the class. Try pulling and tell me what you think!

EDIT: I added you cv vis util, i already did an cv implementation for the deeplab models but to adjust it also for object_detection was a very good thing!

EDIT2: I need to additionally update some code to use OpenCV instead of PIL for the mask drawing. Maybe you could help :)

def draw_mask_on_image_array(image, mask, color='red', alpha=0.4):
  """Draws mask on an image.

  Args:
    image: uint8 numpy array with shape (img_height, img_height, 3)
    mask: a uint8 numpy array of shape (img_height, img_height) with
      values between either 0 or 1.
    color: color to draw the keypoints with. Default is red.
    alpha: transparency value between 0 and 1. (default: 0.4)

  Raises:
    ValueError: On incorrect data type for image or masks.
  """
  if image.dtype != np.uint8:
    raise ValueError('`image` not of type np.uint8')
  if mask.dtype != np.uint8:
    raise ValueError('`mask` not of type np.uint8')
  if np.any(np.logical_and(mask != 1, mask != 0)):
    raise ValueError('`mask` elements should be in [0, 1]')
  if image.shape[:2] != mask.shape:
    raise ValueError('The image has spatial dimensions %s but the mask has '
                     'dimensions %s' % (image.shape[:2], mask.shape))
  rgb = ImageColor.getrgb(color)
  pil_image = Image.fromarray(image)

  solid_color = np.expand_dims(
      np.ones_like(mask), axis=2) * np.reshape(list(rgb), [1, 1, 3])
  pil_solid_color = Image.fromarray(np.uint8(solid_color)).convert('RGBA')
  pil_mask = Image.fromarray(np.uint8(255.0*alpha*mask)).convert('L')
  pil_image = Image.composite(pil_solid_color, pil_image, pil_mask)
  np.copyto(image, np.array(pil_image.convert('RGB')))

Image and ImageColor are PIL Packages

EDIT2: I did it:

mask = cv2.cvtColor(mask, cv2.COLOR_GRAY2RGB)
mask[np.where((mask == [1,1,1]).all(axis = 2))] = color
cv2.addWeighted(mask,alpha,image,1-alpha,0,image)

Its added to the latest commit

Jun 06 '18 08:06 gustavz

Hi, @GustavZ,

I added the following two functions.

def draw_bounding_box_on_image_cv ()
def draw_bounding_box_on_image_array_cv ()

By searching with _cv you will find the fixes.

I deleted the following two functions.

def draw_bounding_box_on_image ()
def draw_bounding_box_on_image_array ()

Also, I changed the color definition for OpenCV. STANDARD_COLORS = [...

I didn't fix next three functions.

def draw_mask_on_image_array ()
def draw_keypoints_on_image_array ()
def draw_keypoints_on_image ()

If this code runs on other models, it will need to modify these functions.

My multi-processing code is bad performance than multi-threading. I tried the following three methods to pass the image data to the process.

multiprocessing.Pipe()
multiprocessing.Manager().dict()
multiprocessing.Manager().Queue()

The fastest one is Pipe(). But that is slower than multi-threading.

Certainly yml has more code in multi-processing. It leads careless mistakes. But in the case of yml, I think that it is a easy way to edit parameters. Because it can without changing the python code.

Therefore, I think it is good idea to modify parameters with yml and prepare a class to load it.

Jun 06 '18 09:06 naisy

alright, now i have the config class plus the yaml file. Also openCV Mask drawing besides your modified vis_util functions!

have a try :)

Jun 06 '18 12:06 gustavz

Hi @GustavZ,

Nice! v2.0's both run_objectdetection.py and run_deeplab.py worked! But.. v2.0 is a little bit slow down than v1.0. Specifically, object_detection.py (v1.0) of 1280 x 720 is 27-28 FPS, but run_objectDetection.py (v2.0) of 1280 x 720 is 25-26 FPS. This seems to be slow regardless of visualization.

How about in your environment?

Jun 08 '18 10:06 naisy

@naisy Yeah i face similar results. I dont know why there is a performance loss :/ maybe you have some pythonic insights?

Another point: Currently i am drawing the masks with openCV like that:

mask = cv2.cvtColor(mask, cv2.COLOR_GRAY2RGB)
mask[np.where((mask == [1,1,1]).all(axis = 2))] = color
cv2.addWeighted(mask,alpha,image,1-alpha,0,image)

The Problem is that this correctly draws the colored masks, but darkens everything around the mask, so if you have several objects on the input image, everything gets nearly black.

Do you have an idea how to solve this corretcly? As info The mask is a Binary (1,0) Numpy array with the same size as the image,but lacking the color channels (so they need to be created) I tried it with

mask =  numpy.expand_dims(mask, 2) 
mask = numpy.repeat(mask, 3, axis=2)

Instead of the cvtColor function, but i has the same effect.

Jun 11 '18 09:06 gustavz

Hi @GustavZ,

I saw run_deeplab.py. before: cv2.addWeighted(seg_image, config.ALPHA, frame, 1-config.ALPHA, 0, frame) before after: cv2.addWeighted(seg_image, config.ALPHA, frame, 1.0, 0, frame) after before's background seems dark. maybe this?

And about color making, before: seg_image = create_colormap(seg_map).astype(np.uint8) this part is not bad, but next is more clear and better performance. after: seg_image = STANDARD_COLORS[seg_map] STANDARD_COLORS

STANDARD_COLORS = np.array([
    (0, 0, 0), # Blank color for background
    (0, 255, 255), # Yellow
    (0, 140, 255), # DarkOrange
    (139, 139, 0), # DarkCyan
    (11, 134, 184), # DarkGoldenRod
    (169, 169, 169), # DarkGrey
    (107, 183, 189), # DarkKhaki
    (211, 0, 148), # DarkViolet
    (238, 130, 238), # Violet
    (0, 128, 128), # Olive
    (122, 150, 233), # DarkSalmon
    (147, 20, 255), # DeepPink
    (204, 50, 153), # DarkOrchid
    (143, 188, 143), # DarkSeaGreen
    (255, 255, 0), # Cyan
    (209, 206, 0), # DarkTurquoise
    (255, 191, 0), # DeepSkyBlue
    (255, 144, 30), # DodgerBlue
    (34, 34, 178), # FireBrick
    (0, 215, 255), # Gold
]).astype(np.uint8)

color time before: 0.01716923713684082
color time after: 0.009123802185058594

colorchange Because color definition can be done for each label in STANDARD_COLORS, I think that it is easy to understand.

Jun 12 '18 07:06 naisy

Thanks for the hints! Very useful as always @naisy 👍

but be carefull you only use 20 colors. Deeplab is trained on 20+1(Background) Classes. If you detect the last class (TV) you will probably run in an error

Jun 12 '18 17:06 gustavz

Yes! Need 20+1! Thank you!

Jun 13 '18 01:06 naisy

realtime_object_detection realtime_object_detection copied to clipboard

Low FPS when with visualisation

realtime_object_detection
realtime_object_detection copied to clipboard