sympy-bot-old
sympy-bot-old copied to clipboard
Optimize the "closed pull requests" page
Currently the closed pull requests page takes about 7s to load, while other pages take < 1s. The reason is that for each of the 600 closed pull requests, we access the User
entity with an implicit GET
query to the database, so the time adds up. It has to be either avoided, or pre-fetched somehow.
I vaguely remember there being some thing on the app engine that profiled the webpage for you. Or maybe it was Google Analytics.
I did profiling by hand. I have disabled the template and did:
class MainPage(RequestHandler):
def get(self):
logging.info("XX1")
q = PullRequest.all()
q.order("last_updated")
logging.info("XX2")
# This is the request that wasn't updated for the longest time:
p = q.get()
if p is None:
last_update = None
last_update_pretty = "never"
else:
last_update = p.last_updated
last_update_pretty = pretty_date(last_update)
logging.info("XX3")
q = PullRequest.all()
q.filter("state =", "open")
q.order("last_updated")
logging.info("XX4")
# This is the open request that wasn't updated for the longest time:
p = q.get()
logging.info("XX5")
if p is None:
last_quick_update = None
last_quick_update_pretty = "never"
else:
last_quick_update = p.last_updated
last_quick_update_pretty = pretty_date(last_quick_update)
logging.info("XX6")
p_mergeable = PullRequest.all()
p_mergeable.filter("mergeable =", True)
p_mergeable.filter("state =", "open")
p_mergeable.order("-created_at")
logging.info("XX7")
p_nonmergeable = PullRequest.all()
p_nonmergeable.filter("mergeable =", False)
p_nonmergeable.filter("state =", "open")
p_nonmergeable.order("-created_at")
logging.info("XX8")
p_closed = PullRequest.all()
p_closed.filter("state =", "closed")
p_closed.order("-created_at")
logging.info("XX9")
p_mergeable.fetch(700)
logging.info("XX9.1")
p_nonmergeable.fetch(700)
logging.info("XX9.2")
p_closed.fetch(700)
logging.info("XX9.5")
for x in p_mergeable:
self.response.out.write("%d %s<br/>" % (x.num, x.title))
for x in p_nonmergeable:
self.response.out.write("%d %s<br/>" % (x.num, x.title))
for x in p_closed:
self.response.out.write("%d %s<br/>" % (x.num, x.title))
#self.render("index.html", {
# "pullrequests_mergeable": p_mergeable,
# "pullrequests_nonmergeable": p_nonmergeable,
# "pullrequests_closed": p_closed,
# "last_update": last_update,
# "last_update_pretty": last_update_pretty,
# "last_quick_update": last_quick_update,
# "last_quick_update_pretty": last_quick_update_pretty,
# })
logging.info("XX10")
This produces the following log:
2011-10-16 20:38:35.690 / 200 3965ms 14948cpu_ms 12218api_cpu_ms 12kb Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1
69.227.158.78 - - [16/Oct/2011:20:38:35 -0700] "GET / HTTP/1.1" 200 12906 - "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1" "3.sympy3.appspot.com" ms=3965 cpu_ms=14948 api_cpu_ms=12218 cpm_usd=0.416711 throttle_code=1 instance=00c61b117ca9c87b85e97cc715f3d0d82762
I 2011-10-16 20:38:31.749
XX1
I 2011-10-16 20:38:31.749
XX2
I 2011-10-16 20:38:32.427
XX3
I 2011-10-16 20:38:32.427
XX4
I 2011-10-16 20:38:32.542
XX5
I 2011-10-16 20:38:32.542
XX6
I 2011-10-16 20:38:32.543
XX7
I 2011-10-16 20:38:32.543
XX8
I 2011-10-16 20:38:32.543
XX9
I 2011-10-16 20:38:32.571
XX9.1
I 2011-10-16 20:38:32.602
XX9.2
I 2011-10-16 20:38:33.931
XX9.5
I 2011-10-16 20:38:35.685
XX10
So you can see, that most of the time (1.7s) is spending in iterating the query when printing it out (XX9.5
to XX10
), also 1.3s is spend in fetching the closed issues from the database. That's quite bad, I was hoping that GAE's datastore will be very fast. Then another second or two in the template (skipped above) and we get 6 or 7s per request. For now I'll leave it be, the solution most probably is to simply not show closed issues on the front page, then it will be reasonably fast.
Lots of good tips are here:
http://www.small-improvements.com/app-engine-performance-tuning
I have split the closed requests from the front page, now the front page loads in < 1s. The closed requests page loads in 7s.
I am changing the title of this issue (to optimize the closed requests page), which is low priority. The conclusion is:
- GAE datastore is actually very quick even for complex queries (like the last change date/time, which needs to retrieve all pull requests, sort them and get the last) etc, but only as long as we retrieve just 1, or a few (less than 20).
- the closed pull requests gets all the results, and also for each item, it accesses the user entity -- so I think there is a query into the datastore for each item. Since there is ~600 of items, the time adds up. The solution is to avoid the implicit
GET
query for each item somehow (prefetch the users, or something). See the link above for some ideas. Anyway, this is not a high priority, because one doesn't need to access the closed pull requests that often.
Right. There are higher priority things. Like putting the data in a nice table so it's easier to read, and possibly has little search fiels on each column header to narrow down the results (like they have in the new Melange). There's probably some nice html5 template for this that we could just use.