donkeycar icon indicating copy to clipboard operation
donkeycar copied to clipboard

Suggestion: convert Keras/Tflite to be multi-threaded on inference

Open cloud-rocket opened this issue 3 years ago • 7 comments

We are not utilizing multi-core capability to run the most CPU consuming task and as a result getting low FPS. (See stats here https://github.com/autorope/donkeycar/issues/690).

I barely measure 19hz on Nano with tflite - based on the resulting calculation (52ms avg for tflite step) FPS cannot be higher (1000 / 52 = 19).

I suggest the following:

  • Push run_threaded inputs to ordered queue "A" (ordered based on push timestamp)
  • Pop from ordered queue B on run_threaded output to return the result (block if empty)
  • Run Keras/tflite inside update while reading inputs from queue "A" and pushing outputs to queue "B" (based on the same input timestamp order)

As a result get almost x4 FPS on Pi4 or almost x8 FPS on Nano!!!!

What do you think? I can push the code, but I don't have any working environment (my models are not yet working) - so I need somebody to test it in a real environment.

/cc: @DocGarbanzo , @sctse999, @tikurahul

cloud-rocket avatar Jan 27 '21 04:01 cloud-rocket

I get full CPU utilisation when running inference on RPi, because numpy and probably tf too are using multithreading internally.

DocGarbanzo avatar Jan 31 '21 17:01 DocGarbanzo

I get > 45 Hz easily. I am not sure where the bottlenecks are in your case.

tikurahul avatar Jan 31 '21 20:01 tikurahul

I get > 45 Hz easily. I am not sure where the bottlenecks are in your case.

@tikurahul - What is your setup and TF version?

cloud-rocket avatar Feb 02 '21 03:02 cloud-rocket

I use a Jetson Nano, so my experience may not be super applicable here. :smile:

tikurahul avatar Feb 13 '21 16:02 tikurahul

Actually after a recent RPi software upgrade, tf has become much slower for '.h5' models, that's probably the reason why you are seeing this. Do you see the same affect also for .tflite models?

DocGarbanzo avatar Feb 14 '21 14:02 DocGarbanzo

I am working on a generic multi-threaded solution (please see -https://github.com/cloud-rocket/donkeycar/blob/add-multithreaded-keras-pilot/donkeycar/parts/keras.py)

But for some reason I only experience performance degradation (on both h5 and tflite options) - still not yet understand why...

cloud-rocket avatar Feb 14 '21 17:02 cloud-rocket

Can you check with the latest version on dev. Set the variable CREATE_TENSOR_RT = True and use the .trt tensorrt model on the nano?

DocGarbanzo avatar Jul 11 '21 18:07 DocGarbanzo

I'm going to close this. Both Tensorflow and Tensorflow Lite support multi-threaded inference. Here is Google's page on how to profile a model and the levers for increasing performance, including throwing more threads at it. https://www.tensorflow.org/lite/performance/best_practices#tweak_the_number_of_threads

Ezward avatar Mar 04 '23 01:03 Ezward