KittiBox Howto perform fast inference (realtime)

dear MarvinTeichmann, I have run your code normally and gotten correct results. However, I found the running time is too long (as your paper declared, the release time is about 30+ms). My GPU is K40, and the results are as follows:

2017-03-10 13:43:10,864 INFO /home/mifs/mttt2/local_disk/RUNS/TensorDetect2/paper_bench/tau5_zoom_0_kitti_2016_11_09_05.57/model.ckpt-179999 2017-03-10 13:43:17,416 INFO Weights loaded successfully. 2017-03-10 13:43:17,416 INFO Starting inference using data/demo2.png as input time_h: 2.3411090374 2017-03-10 13:43:19,819 INFO 7 Cars detected 2017-03-10 13:43:19,819 INFO 2017-03-10 13:43:19,819 INFO Coordinates of Box 0 2017-03-10 13:43:19,819 INFO x1: 425.5

Could you help me? Thank you very much!

Mar 10 '17 05:03 coolhebei

name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate (GHz) 1.531 pciBusID 0000:03:00.0 Total memory: 11.90GiB Free memory: 11.75GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:03:00.0) 2017-03-09 22:17:07,062 INFO /home/mifs/mttt2/local_disk/RUNS/TensorDetect2/paper_bench/tau5_zoom_0_kitti_2016_11_09_05.57/model.ckpt-179999 2017-03-09 22:17:10,497 INFO Weights loaded successfully. 2017-03-09 22:17:10,497 INFO Starting inference using data/demo2.png as input

2017-03-09 22:17:12,558 INFO 7 Cars detected 2017-03-09 22:17:12,558 INFO 2017-03-09 22:17:12,558 INFO Coordinates of Box 0 2017-03-09 22:17:12,558 INFO x1: 425.5 2017-03-09 22:17:12,558 INFO x2: 464.5 2017-03-09 22:17:12,558 INFO y1: 183.5 2017-03-09 22:17:12,559 INFO y2: 204.5 2017-03-09 22:17:12,559 INFO Confidence: 0.945907235146 2017-03-09 22:17:12,559 INFO

Mar 10 '17 06:03 bigsnarfdude

@bigsnarfdude , I think you mean the same question? need 2s to do the task.

Mar 10 '17 07:03 coolhebei

Firstly I am using titan x (pascal) to measure runtime. K40 is rather old, so you might not get the same results. In addition, demo.py is not meant to measure inference time. The image is loaded from the disk in sequential and feed to the graph using placeholders. This is slow according to the tensoflow documentation. In addition, inference is performed only once. The first time inference is run tensorflow selects the subgraph which needs to be computed. The whole think is much faster, if the same op is computed multiple times. And lastly, demo.py plots a visualization in python. Computing a visualization is not considered to be part of the actual detection. (And this can be done in parallel on CPU anyway, so no need to wait till this computation is finished).

To archive the throughput of the paper, images are loaded from the disk in parallel using Tensorflow Queues. It can be assumed, that a real-time system does not store the input on hdd, but is able to provide the data in memory. So this is a fair comparison. In addition the same op (with different input) is evaluated 100 times and the average runtime is reported.

I will provide code for fast inference after ICCV deadline. The purpose of demo.py is to provide easy code so that users not familiar with tensorvision see how the model works. Demo.py is kept simple for this purpose and all the advanced tensorflow queuing stuff is not included.

Mar 10 '17 11:03 MarvinTeichmann

I didn't have an opinion of whether the inference time is fast or slow. I have TitanX(Pascal) and just provided output for reference. Thanks @MarvinTeichmann for the code. I look forward to the future releases.

Mar 10 '17 15:03 bigsnarfdude

Btw. that both of you have an inference time of about 2s show, that the GPU is not the bottleneck in the current setup. One would aspect a Titan X pascal about 2-3 times faster. So most of the time is actually spend in reading the data, loading the computational graph into the gpu, .etc.

For a quick and dirty speed benchmark you can do somethink like this:

# One run to ensure that the tensorflow graph is loaded into the GPU
sess.run([pred_boxes, pred_confidences], feed_dict=feed)
start_time = time()
for i in xrange(100):
   sess.run([pred_boxes, pred_confidences], feed_dict=feed)
total_time = (time() - start_time)/100.0

This should give you an inference speed close to the one cited in the paper.

Mar 10 '17 16:03 MarvinTeichmann

Tensorflow devs have documented that "feed_dict" is one of the slower methods of passing data. (My thoughts: If "feed_dict" is used for current inference calculations, then I would imagine that other methods may increase inference speed if the pipeline is optimized).

Two different docs provide better ways of getting data to the GPU for both inference and training:

https://www.tensorflow.org/programmers_guide/reading_data
https://www.tensorflow.org/extend/new_data_formats

Mar 10 '17 17:03 bigsnarfdude

Thank for @bigsnarfdude @MarvinTeichmann The slower speed of my procedure is contributed to

my car is k40 maxvell, which is much slower than titan x pascal (about 180ms vs 30ms for vgg16)
the first inference is actually slower than others

Finally, thanks a lot for sharing the code~ An exciting work!

Mar 14 '17 08:03 coolhebei

People seem to love demo.py. As mentioned earlier, demo.py was not designed as evaluation code and is very slow. Demo.py is meant as a way to understand how the code works. Evaluate.py is meant to be used for evaluation.

However people seem to love demo.py #30, #41, #54. If you don't want to mess around with the evaluation model, modify demo.py to perform evaluation of images in a loop like this:

for image in images:
    feed = {image_pl: image}
    softmax = prediction['softmax']
    output = sess.run([softmax], feed_dict=feed)

This will avoid building the entire tensorflow graph for each image. This is still not perfect, but way faster then calling the whole demo.py script for each image like this: this.

If you like to measure running time, keep in mind that tensorflow compiles the graph and allocation memory in the first run. So don't measure the time it takes for the first image. See the comment here.

May 16 '17 14:05 MarvinTeichmann

Hello! Do you have updates on the code for fast inference?

Jun 23 '17 19:06 villanuevab

No, sorry. Did not find the time to work on this. For a good start use the loop I have suggested in the comment above.

Jun 23 '17 19:06 MarvinTeichmann

hi, I got this error>

   
 softmax = prediction['softmax']
KeyError: 'softmax'

when I want to use your tip>

for image in images:
    feed = {image_pl: image}
    softmax = prediction['softmax']
    output = sess.run([softmax], feed_dict=feed)

all modules were loaded successfully

Jul 14 '17 14:07 lukaspistelak

@lukaspistelak the code that @MarvinTeichmann posted, i.e.,

for image in images:
    feed = {image_pl: image}
    softmax = prediction['softmax']
    output = sess.run([softmax], feed_dict=feed)

is for KittiSeg's demo.py. For KittiBox, try:

for image in images: 
    feed = {image_pl: image}
    
    # Run KittiBox model on image
    pred_boxes = prediction['pred_boxes_new']
    pred_confidences = prediction['pred_confidences']
    (np_pred_boxes, np_pred_confidences) = sess.run([pred_boxes,
                                                     pred_confidences],
                                                     feed_dict=feed)

Hope this helps.

Jul 20 '17 23:07 villanuevab

If we wanted to grab the output file after rectangles have been drawn on the image we would need to include

# Apply non-maximal suppression
# and draw predictions on the image
        
output_image, rectangles = kittibox_utils.add_rectangles(
                 hypes, [image], np_pred_confidences,
                 np_pred_boxes, show_removed=False,
                 use_stitching=True, rnn_len=1,
                 min_conf=0.50, tau=hypes['tau'], color_acc=(0, 255, 0))

in that for loop, where 'output_image' would be the image with the rectangles drawn correct? what is the format of output_image here?

Thanks for your work on this!

Aug 04 '17 23:08 swkonz

KittiBox KittiBox copied to clipboard

Howto perform fast inference (realtime)

KittiBox
KittiBox copied to clipboard