tensorflow-on-raspberry-pi very slow inception?

Describe the Issue

im following this blog post to build a robot: oreilly.com/learning/how-to-build-a-robot-that-sees-with-100-and-tensorflow . in their video, it shows inception/classification takes about 10 seconds, but running with this version of tensorflow, it runs in 50 seconds, seems way too slow. is it supposed to be so slowm, anything special to get inception to run faster?

Steps to Reproduce

Hardware/Software Info

Im on raspberry 3b.

Please provide the following information about your Raspberry Pi setup:

Raspberry Pi model: Im on raspberry 3b.
Operating System used: jesse
Version of Python used:
SD card memory size: 16 gb
Size of USB/other device used as swap (if building from source):
TensorFlow git commit hash (if building from source): not from source

Relevant Console Output/Logs

Nov 29 '16 15:11 jtoy

Interesting- I've never seen it take that long to process an image (it's slow, but not that slow). This is with the 0.10 binaries? How exactly are you running the Inception model?

Nov 29 '16 21:11 samjabrahams

@samjabrahams I use this binary: install tensorflow-0.10.0-cp27-none-linux_armv7l.whl I use this script, which is from the author of the article: https://github.com/lukas/robot/blob/master/classify_image.py

that script uses: http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz did you use this model also? what times did you get on it? is there anything I can do to make it faster?

Nov 30 '16 04:11 jtoy

I see. There are a couple of things going on here:

First- his robot isn't actually using classify_image.py, it's actually using a statically compiled version of TensorFlow- check out the makefile example he's using here. Not sure why he has classify_image.py in that repo, as it isn't called from anywhere else in the codebaes.

That said, we can do better than what you're seeing with classify_image.py. Right now, you're running classify_image.py each time you want to check an image, right? That is going to cause a large amount of overhead, as the code reloads the graph definition each and every time you run the script from the command line. Not only do you get overhead from recreating the graph, but TensorFlow makes on-the-fly adjustments every time you run from a Session.

To illustrate this, I made a quick adjustment to the classify_image.py file. It now loops over the prediction stage 25 times and prints out the average time of each prediction (in seconds) after finishing:

classify_image.py.tar.gz

You'll notice that it takes ~30 seconds to get the first prediction out, but afterward each iteration is a little bit faster. For me, I get an average of about 2.5 seconds per prediction.

So how will you be able to take advantage of this? You'll need to keep the TensorFlow Session active while the robot is running. How you do this is up to you, but the basic gist is to include loading the TensorFlow graph and session in your robot's live server or logic loop or what-have-you. That way you only load the graph once and can continue to feed it new inputs without needing to reload.

Nov 30 '16 19:11 samjabrahams

@samjabrahams wow, thanks for the detailed response, I just used his classify_image script with your tensorflow version. I'll hold it in memory and use your script. I plan to write benchmarks across a bunch of different models running on raspberry pi. I'd love to talk more with you over email, didnt see it, can you give me your email or email me (my email is on my profile)

Nov 30 '16 21:11 jtoy

Yes, I have the same problem about tensorflow running too slow http://stackoverflow.com/questions/40705203/tensorflow-label-image-recognize-so-slow

p/s : If you you have time. Please, take a look my other questions too http://stackoverflow.com/questions/40744779/can-i-use-inception-image-retraining-in-tensorflow-to-recognize-letters http://stackoverflow.com/questions/40764933/recognize-letters-ocr-using-image-retrainingv0-10-0-in-tensorflow-but-differ

Dec 01 '16 10:12 Khoa-NT

@samjabrahams me too it is getting too slow sometimes it takes 1 and a half minutes for an image :( I downloaded it using pip. do you have any suggestion? thank you sam!!

Jan 21 '17 19:01 joekh1

@shaolinkhoa @joekh1 Have you read the above comment? It takes a while to load up the model into memory and run the first few iterations, but after that it should be down to closer to 2 seconds per run.

Please run this benchmark on your RPi (download it, and run python classify_image_timed.py) and report the average times you get.

I'm currently formatting a Raspberry Pi to see if I can replicate slow times when installing from the binaries.

Jan 21 '17 19:01 samjabrahams

@samjabrahams yeah bro I saw it, where do I put the classify_image.py? I saw it in /usr/local/lib/python2.7/dist-packages/tensorflow/models/image/imagenet. EverytimeI run it it started to download a file, but it stops in the middle, I downloaded it from PC then transfer it to Raspi but nothing changed maybe cz I don't know where to put it!!

Jan 21 '17 19:01 joekh1

Don't use the version provided in dist-packages. The version you want (the one I linked to) is called "classify_image_timed.py", not "classify_image.py". I modified the original classify_image.py file to print out summary statistics. It doesn't matter where you download the file.

Easiest thing to do would be to copy/paste these lines from your Raspberry Pi terminal:

wget https://raw.githubusercontent.com/samjabrahams/tensorflow-on-raspberry-pi/master/benchmarks/inceptionv3/classify_image_timed.py
python classify_image_timed.py

It will take a while to run, as it needs to download the Inception model (~85 MB). Just be patient and let the script finish running; it will print out summary statistics.

Jan 21 '17 20:01 samjabrahams

@samjabrahams Oh thank you so much bro!! I will try it and tell you if it works!! thanks again buddy!!

Jan 21 '17 21:01 joekh1

As an update: from a completely fresh install of Raspbian, I achieved the same speeds from the pre-built binaries as I did from installing from sources.

Jan 21 '17 21:01 samjabrahams

@samjabrahams thank you so much bro!!! I tried it it took 1.89 seconds, but when I retry to run it again, I didn't got the panda image, It started to download again! is it usual? :(

Jan 22 '17 06:01 joekh1

It's weird that it's re-downloading everything- it should be saved to a /tmp/ directory. When I rerun it on my RPi, it doesn't re-download the file.

It doesn't output any info about predicting the panda image as I took out those print statements from the original file. If you add these lines back into the file, you should be able to get that info back :)

Jan 22 '17 07:01 samjabrahams

@samjabrahams I fixed the download by going root and I put the .tgz file in imagenet. Sorry for my questions, So everytime, I need to test tf I have to go to this file? can I use it with OpenCV? Thanks buddy!!

Jan 22 '17 08:01 joekh1

I mean, you don't have to use it- it's just a handy way to check the time difference between different platforms (RPi, OSX, Linux w/ GPU, etc).

Not sure how it would work with OpenCV- that's a completely different library.

Jan 22 '17 08:01 samjabrahams

@samjabrahams thanks buddy, one last question, I try to run inception_v3.py but it is slow!! can you please tell me what is wrong with it?`` https://github.com/fchollet/deep-learning-models/blob/master/inception_v3.py

Jan 22 '17 10:01 joekh1

It is slow because every time you run that program from the command line, it has to rebuild the entire model from scratch, as I explained in the longer post above. It takes a significant amount of time to import tensorflow and build the model. Additionally, the first time you run any model after loading it into memory will be much slower than subsequent times, as TensorFlow makes optimizations every time a model is run In order to make practical use of TensorFlow on a Raspberry Pi, you have to keep the program running and store the model in memory.

Jan 22 '17 16:01 samjabrahams

@samjabrahams Amazing help!! I was able to use it on my own codes, within seconds I got great results!! Thank you so much buddy!!

Jan 22 '17 18:01 joekh1

@samjabrahams tensorflow is really running slow for me on pi help

Feb 02 '17 12:02 vishnuog

@vishnuog Can you be a bit more specific about what you're trying to do and what your issue is?

Feb 04 '17 19:02 samjabrahams

Sir i m using the tensorflow u compiled in my pi.It takes about 90 seconds to get the o/p.Is there a faster version of Tensorflow,if there is where can i get it can u send the links from where i can i can download it.I downloaded it from the links given below:

sudo apt-get update

sudo apt-get install python-pip python-dev

wget https://github.com/samjabrahams/tensorflow-on-raspberry-pi/releases/download/v0.12.1/tensorflow-0.12.1-cp27-none-linux_armv7l.whl

sudo pip install tensorflow-0.12.1-cp27-none-linux_armv7l.whl

sudo pip uninstall mock sudo pip install mock

On Sun, Feb 5, 2017 at 1:14 AM, Sam Abrahams [email protected] wrote:

@vishnuog https://github.com/vishnuog Can you be a bit more specific about what you're trying to do and what your issue is?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samjabrahams/tensorflow-on-raspberry-pi/issues/54#issuecomment-277470544, or mute the thread https://github.com/notifications/unsubscribe-auth/AYTVsfeKXqlyYpzLqd-oTnpUt9PTrCGPks5rZNUKgaJpZM4K_FIr .

Feb 09 '17 13:02 vishnuog

@vishnuog I'm assuming you're doing something along the lines of using the pre-trained Inception model. I'm going to copy-paste my response from above which is the most important advice for making sure you get better runtimes:

It is slow because every time you run that program from the command line, it has to rebuild the entire model from scratch, as I explained in the longer post above. It takes a significant amount of time to import tensorflow and build the model. Additionally, the first time you run any model after loading it into memory will be much slower than subsequent times, as TensorFlow makes optimizations every time a model is run In order to make practical use of TensorFlow on a Raspberry Pi, you have to keep the program running and store the model in memory.

Are running a command like this every time you want to make a prediction?

$ python my_tensorflow_script.py

If you are, then you're going to see a huge amount of time lost to importing the TensorFlow library, loading the pre-trained model into memory, and re-optimizing sessions.

To see a quick example of how you might persist a model in memory, I have a repository showcasing a simple TensorFlow with Flask example. It's by no means production-ready, but it should be easy to follow. The main bit to look at is the Session singleton, which ensures there's only one TensorFlow Session and "warms up" the Session ahead of time.

Again, you haven't really been specific about what you're trying to do in TensorFlow, so I can't give you a specific diagnosis of the problem. Just saying "It's slow! The vague thing I'm trying to do is too slow! Can it be faster?" doesn't give me a lot to go off of :)

Feb 09 '17 20:02 samjabrahams

Not sure if this is any use but there is big difference between the new 4.9 kernel and the older 4.4 kernel.

classify_image.py benchmark on 4.4 (Linux pi3 4.4.50-v7+ #970 SMP Mon Feb 20 19:18:29 GMT 2017 armv7l GNU/Li) Best run: 15.0607 Worst run: 17.3751 Average run: 16.4809 Build graph time: 5.4814 Number of warmup runs: 10 Number of test runs: 25

Vs similar test

classify_image.py benchmark on 4.9 (Linux pi8 4.9.14-v7+ #977 SMP Mon Mar 13 18:25:19 GMT 2017 armv7l GNU/Linux)

Best run: 1.7282 Worst run: 1.7990 Average run: 1.7604 Build graph time: 6.3642 Number of warmup runs: 10 Number of test runs: 25

Pretty close to results here: https://github.com/samjabrahams/tensorflow-on-raspberry-pi/tree/master/benchmarks/inceptionv3

Morale of the story is make sure you run rpi-update and reboot regularly.

Mar 16 '17 22:03 jamiekiely

Can somebody help me out here.I am successfully running tensorflow on pi but the problem is everytime i shutdowm my pi and relogin the imagenet folder is missing in my tmp directory .How on earth is this happening.Somebody pls reply thanks

Mar 28 '17 03:03 vishnuog

@vishnuog - the /tmp/ directory is designed to be deleted after restart (hence "temporary"). If you want to download a specific directory and have it stick around, use the --model_dir command line argument:

python classify_image_timed.py --model_dir='inception'

Mar 28 '17 06:03 samjabrahams

Lets not use this as a support line. Im sure @samjabrahams is extremely busy. The code does work and is pretty fast.

Mar 28 '17 15:03 jtoy

Hi @vvkv - the main thing is that you'd want to setup some sort of serving architecture or runtime loop. If you use your model by running the Python script from the command line, you end up having to go through the entire import process again from scratch.

You might try looking into using Flask to create a lightweight, persistent server. You would then use HTTP to send requests to the server, which would execute the model. I have a simple demonstration available using Flask+TensorFlow here. The example isn't production worthy, as the web browser server and TensorFlow model are on the same thread, so the server blocks until its finished running the graph. That said, it's simpler this way, which makes it a bit easier to learn from.

Jun 02 '17 19:06 samjabrahams

@samjabrahams Thank you for your pointers. These will be very useful

Jun 03 '17 03:06 vvkv

@vvkv did you mean to delete your previous message? It'd be nice to have around for context if other people are looking to answer the same question you did.

Jun 03 '17 03:06 samjabrahams

@samjabrahams nope did not mean to do that. Let me repost the question so people know what you were referring to. I will be retyping it so some of the details might be lost but I will besure to include the key points, thanks for pointing it out.

The question I asked in context to Sam's response to me:

I am working with tensorflow for poets and trying to run it off my Raspberry Pi 3. Tensorflow for poets essentially uses the inception module to retrain the last layer of the net for one to train the network with their own data. My key problem was that running the script label.py on my pi took about 20 seconds. This is an unreasonable time by any means. I read through Sam's previous answers and understood that I was loading the graph everytime I was running the script and was looking for ways to reduce this time to the benchmark of 2.4 seconds (this was the average time when I ran the benchmark tests on my pi). My question to Sam was how would he recommend I make the graph load before hand to every use case does not call for a reload of the graph, and so I can save time for practical situations.

label.py can be found in the Tensorflow for poets tutorial here: https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#0

Sam's answer has been very useful.

Thank you again :)

Jun 03 '17 04:06 vvkv