knowledge-repo icon indicating copy to clipboard operation
knowledge-repo copied to clipboard

Rendering posts with embedded images is very slow

Open rene-giretzlehner opened this issue 7 years ago • 26 comments

Auto-reviewers: @NiharikaRay @matthewwardrop @earthmancash @danfrankj

we are running knowledge-repo in our company since a few weeks on a server and are facing issues with high cpu load and very slow response times when clicking on a knowledge post the first time. It takes up to 20sec to render the post. During this time the CPU is up to 100%. When clicking on the post a 2nd time - it's fast as you would expect a page to render. The same thing occurs with every other post as well. As soon as we restart the server process - the issues start from scratch.

Did you ever have any of these issues? We're running on a ubuntu box.

rene-giretzlehner avatar Mar 16 '17 11:03 rene-giretzlehner

Hi @rene-giretzlehner ! Thanks for reporting this issue to us!

When the knowledge-repo starts up, it creates/updates an index database for the posts. If you do not specify a path for this index, it creates the index in-memory. I suspect that that is happening is that you have not specified a path, so every time it restarts it needs to recreate this index.

Can you confirm that this is the case?

To prevent this initial load, you can use a local sqlite or remote (potentially shared) database between (perhaps more than one) knowledge_repo instances. For example:

knowledge_repo --repo <repository> deploy --dburi sqlite:///test.db knowledge_repo --repo <repository> deploy --dburi mysql://db_user@localhost/knowledge

Does this help?

matthewwardrop avatar Mar 17 '17 04:03 matthewwardrop

We have configured the database via server_config.py SQLALCHEMY_DATABASE_URI = 'sqlite:////home/deep/knowledge.db' and startup the server with knowledge_repo --repo {magic}/home/deep/deepthought --repo {webposts}sqlite:////home/deep/dbrepo.db:mypostreftable runserver --config server_config.py --port 80

So the database grows over time and viewcount etc. updates. But on every server restart the behaviour is again as described in our initial post.

rene-giretzlehner avatar Mar 20 '17 14:03 rene-giretzlehner

@rene-giretzlehner Hmmm... I think I understand what is going on here. Currently, the database sync is triggered if it has been more than 5 minutes since the last index and the data repository has changed since its last update. The SQLite database does not support threading, and so when using an SQLite database to store the index metadata, updates are done synchronously. There are improvements in the pipeline for these synchronisation processes, not all of which will reach sqlite databases unfortunately, you can greatly reduce this initial startup time by:

  1. Using deploy rather than runserver to deploy the knowledge repo. The runserver mode is primarily for development, and does not handle multiple requests very well.
  2. Using a mysql/postgres/server-based database, rather than SQLite. The result of the above will be quite pronounced in your experience. Index updates will be done asynchronously, and you should never see a significant lag loading the page.

We're in the process of improving our documentation, but in the mean time, I hope that helps.

matthewwardrop avatar Mar 20 '17 18:03 matthewwardrop

Thanks for your feedback - I'll give a MySQL a try and let you know if this resolves our issues!

rene-giretzlehner avatar Mar 21 '17 08:03 rene-giretzlehner

Short update - it got worse ;-)

I installed MySQL and start the server now with this command: sudo knowledge_repo --repo {magic}/home/deep/deepthought deploy --config server_config.py --port 80 [2017-03-23 17:17:43 +0000] [9254] [INFO] Starting gunicorn 19.7.0 [2017-03-23 17:17:43 +0000] [9254] [INFO] Listening at: http://0.0.0.0:80 (9254) [2017-03-23 17:17:43 +0000] [9254] [INFO] Using worker: sync [2017-03-23 17:17:43 +0000] [9268] [INFO] Booting worker with pid: 9268 [2017-03-23 17:17:43 +0000] [9269] [INFO] Booting worker with pid: 9269 [2017-03-23 17:17:44 +0000] [9272] [INFO] Booting worker with pid: 9272 [2017-03-23 17:17:44 +0000] [9274] [INFO] Booting worker with pid: 9274 DB config in server_config.py:SQLALCHEMY_DATABASE_URI = 'mysql://deep@localhost/knowledge'`

Database is setup the correct way, but now every pageview of knowledge posts take up to 20sec to render.

in console the "top" command shows that: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9268 root 20 0 147864 72900 6876 R 100.0 1.8 0:08.20 knowledge_repo

Any other idea what could cause that high CPU load and slow response times?

rene-giretzlehner avatar Mar 23 '17 16:03 rene-giretzlehner

@rene-giretzlehner Whoops! Well... that wasn't supposed to happen!

There is one other issue which I know of that we have suffered from internally, which was a result of merging https://github.com/airbnb/knowledge-repo/pull/229. It seemed to increase server load significantly, but I've yet to find anything wrong with it (and without it, postgres and other databases don't work). I'll take another look soon.

In the mean time, can you try either reverting that commit, or seeing if you see the same behaviour with 0.7.5 ? If you don't see the same behaviour, great! Otherwise, let's set aside some time to see what's going wrong early next week :).

matthewwardrop avatar Mar 23 '17 21:03 matthewwardrop

I downgraded to 0.7.5 - still the same issue :-) High CPU load on each post pageview. Do you have any other idea what this could cause?

rene-giretzlehner avatar Mar 27 '17 08:03 rene-giretzlehner

@rene-giretzlehner No clue. I'll look into this. Thanks so much for being patient throughout :).

Out of curiosity, how many posts do you have in your knowledge repo system? [So I can reproduce the same potential load issues here].

matthewwardrop avatar Mar 28 '17 03:03 matthewwardrop

haha...not so many right now - we just started testing / playing around with it - but want to use it for all our knowledge posts from datascience & engineering perspective in future (for that - it really has to run smoothly).

So it's 7 posts containing images - that's it :-)

rene-giretzlehner avatar Mar 28 '17 15:03 rene-giretzlehner

@rene-giretzlehner Hmmm... one idea remains. How large are the images in the knowledge posts? At the moment these images are base64 encoded, so if the images are huge, perhaps that could be a related issue.

I'll look into this soon :).

matthewwardrop avatar Mar 28 '17 15:03 matthewwardrop

a usual post contains 10-20 images (png) each between 30kb-45kb.

rene-giretzlehner avatar Mar 29 '17 07:03 rene-giretzlehner

we did a test without images in the post - renders super fast :-) so the root-cause seems to really be the images. is there anything we can do?

rene-giretzlehner avatar Mar 29 '17 08:03 rene-giretzlehner

@rene-giretzlehner Ah! I think we're getting closer to understanding it :). One final question before I have everything I need to know: is it slow only for posts stored in the sqlite database? If so, if you instead you use a mysql database for the posts, do these issues disappear? [I'd be cautious about using sqlite databases for posts, since they only support single threads, resulting in locking issues].

i.e use --repo {webposts}mysql://deep@localhost/knowledge:webposts or some such.

matthewwardrop avatar Mar 29 '17 15:03 matthewwardrop

I changed to MySQL as you suggested already last week, but it did not improve the behaviour - so no SQlite db in use anymore on our system.

sudo knowledge_repo --repo {magic}/home/deep/deepthought deploy --config server_config.py --port 80

having this db config in server_config.py SQLALCHEMY_DATABASE_URI = 'mysql://deep@localhost/knowledge'

rene-giretzlehner avatar Mar 30 '17 06:03 rene-giretzlehner

Ah.... so no more webposts either! Right, well... I'll go away and try to figure this out.

The reason we probably haven't seen this internally is that we upload all images to an S3 bucket prior to adding the post to the repository, and hence images are not embedded in the repository. The recipe for doing this is to add use "https://github.com/airbnb/knowledge-repo/blob/master/resources/extract_images_to_s3.py", and import/embed it in the git repostiory .knowledge_repo_config.py file; and add to your configuration file:

# Postprocessors to apply when importing KnowledgePost objects into the repository.
# Note that KnowledgePost objects by default run 'extract_images' and 'format_checks'.
# Order is important.
postprocessors = ['extract_images_to_s3']

In the meantime, I'll try to sort out the performance issues surrounding embedded images.

matthewwardrop avatar Mar 30 '17 19:03 matthewwardrop

@rene-giretzlehner I pushed a fix that I think should address at least part of this in #256. I did some profiling of the knowledge repository and it seems like its actually the code highlighting that was taking a long time. Can you confirm that this improve things?

matthewwardrop avatar Apr 15 '17 22:04 matthewwardrop

@matthewwardrop we tested the fix - didn't help in our case. anything we could test further?

rene-giretzlehner avatar Apr 19 '17 14:04 rene-giretzlehner

@rene-giretzlehner Not at this stage. Will try some more profiling of various different cases (previously I dealt with ~20-50 plots in a post, which I thought would emulate your setup). What might help is if you can provide a sample knowledge post (appropriately censured) for which you experience the slow-downs. I can then use that in my profiling runs to make sure I deal with the issue you are explicitly facing.

matthewwardrop avatar Apr 20 '17 19:04 matthewwardrop

Hi @matthewwardrop,

I'm sending you an example post as substitute for @rene-giretzlehner :-) Please find the source files at https://github.com/Breaka84/knowledge-repo/tree/plotly_md_test/tests/test_posts/slow_rendering This post takes about 25 to 30 seconds to initially load after a server restart.

Small hint, if you have the r plotly package not yet installed: I needed to additionally install the apt packages of libcurl4-openssl-dev & gfortran to install the necessary r-libraries.

Best, David

Breaka84 avatar Apr 25 '17 15:04 Breaka84

Thanks @Breaka84! I'll see if I can get to the bottom of this soon.

matthewwardrop avatar Apr 25 '17 19:04 matthewwardrop

@Breaka84, Even with your sample post, everything is snappy in the knowledge repository here... so I'm not quite sure what is going on. If you give me a profiling dump, then I may be able to get to the bottom of this. Can you try running the following, and send me the generated 'perf.prof' at matthew dot wardrop at airbnb dot com? Note that paths are shared as part of this profile, so double check to make sure nothing important is being leaked :).

import time
from cProfile import Profile
from pstats import Stats
import knowledge_repo


REPO_PATH = "<path of the knowledge repo in which slow post resides>"
SLOW_POST_PATH = "<path of slow post within repo>"

repo = knowledge_repo.KnowledgeRepository.for_uri(REPO_PATH)
app = repo.get_app()
client = app.test_client()

p = Profile()
response = p.runcall(client.get, '/post/slow_post/plots.kp')
print("Time taken to load post: {}".format(p.total_tt))

stats = Stats(p)
stats.dump_stats('perf.prof')

stats.sort_stats('tottime').print_stats()

matthewwardrop avatar Apr 29 '17 05:04 matthewwardrop

Hi @matthewwardrop,

thank you for the code! I presume p.runcall(client.get, '/post/slow_post/plots.kp') should be response = p.runcall(client.get, SLOW_POST_PATH) ? I run it today but the total time is below 1 second (0.751 seconds) :confused: A curl | wget localhost/post/slow/post.kp on the server itself gave me over 20 seconds for each call... I don't think it makes any sense to send you the stats.

It looks like the python webframework (flask?) within the knowledge-repo makes troubles. Is there a debug mode, with which we can track every step within the knowledge-repo / python while curling the slow test post?

Breaka84 avatar May 05 '17 15:05 Breaka84

Hi @rene-giretzlehner @matthewwardrop! I had the same problem, but after update python to 3.6 version everything works good. You can try to do this with virtualenv or conda.

I hope that helps.

Napoleon-Black avatar Aug 01 '17 07:08 Napoleon-Black

@rene-giretzlehner @Breaka84 See the PR above for our fix for slow and failing posts with images. Our setup is a bit hacky at the moment but we're shoring up the details now that we know it works.

alecraso avatar Aug 24 '17 02:08 alecraso

I ran into issues with very slow image loads on a git-based knowledge repo, but it was using Python 2. Preliminary testing seems to show massive speed improvements if I run Python 3 instead.

jdavidheiser avatar Oct 12 '17 23:10 jdavidheiser

If anyone else is running into this issue, upgrading to Python 3 does seem to resolve the problem. I have no idea why it fixes things, but sure enough it makes a big difference. The long term fix would be to use a post processor like the one that @aaronbiller merged, but this a quick and easy fix for the short-term.

Thanks for the heads up @Napoleon-Black.

jordan-wright avatar Mar 22 '18 22:03 jordan-wright