purebred
purebred copied to clipboard
Performance: scrolling through a long list of threads is slow
When a notmuch search results in a lot of items, scrolling down or up by keeping the key pressed uses a lot of CPU and is visually slow.
How to reproduce:
- Use a search term which results in a lot of list items
- keep the key pressed to "scroll" down or up
Attached is a the output of a profile run: purebred_profiled.txt
Most of the cost is in brick itself. We may have to do some upstream optimisation work in brick to improve things.
@romanofski because the cost is almost entirely in brick, could you come up with a minimal test program that reproducibly puts brick through its paces (ideally without any user input) and results in a similar cost table? Then we can use that as the basis for optimising brick without anything purebred-related getting in the way.
Yeah that's actually a great idea, to write a smaller reproducer. Will do...
I'm having similar performance issues with a brick-based application I have written. When you say, "a lot of list items", about how many items/rows are you talking about?
@c-lewis I think I observed it with even 50 items, although I would have to go back to verify. Definitely go with 500 items and you should see a sluggish performance.
From a first glance we render a lot of different fields like:
listDrawMail :: AppState -> Bool -> NotmuchMail -> Widget Name
listDrawMail s sel a =
let settings = view (asConfig . confNotmuch) s
isNewMail = hasTag (view nmNewTag settings) a
widget = padLeft (Pad 1) (txt $ formatDate (view mailDate a)) <+>
padLeft (Pad 1) (renderAuthors sel $ view mailFrom a) <+>
padLeft (Pad 1) (renderTagsWidget (view mailTags a) (view nmNewTag settings)) <+>
padLeft (Pad 1) (txt (view mailSubject a)) <+> fillLine
in withAttr (getListAttr isNewMail sel) widget
from https://github.com/purebred-mua/purebred/blob/master/src/UI/Index/Main.hs and I think it's the combination with many items and the combinators <+>
. Again would have to verify, just a hypothesis.
@romanofski this agrees with my observation that a lot of the cost is in the functions that compute where to paint / how much space is available for different widgets.
@romanofski here's a commit. Could you please do some profiling before/after this commit and publish the output? https://github.com/purebred-mua/purebred/commit/83a37f807b106d387e54b9a6a39da7ec0cd9d0be
Hm... nope. Procedure:
$ git rev-parse HEAD
83a37f807b106d387e54b9a6a39da7ec0cd9d0be
$ stack clean
$ stack build --profile
$ stack exec --rts-options "-P" purebred
New profile log: purebred_profiled.txt which is not so much different then the old log from the first glimpse. Wondering if I stuffed something up o.O
Before I forget, the amount of items displayed were a meagre: 47
Hm, ok. Needs a more controlled test to better measure if there's any tangible improvement. This technique definitely made a difference in a small test app for me, but maybe it is overwhelmed by other factors or the more complex UI, and the list rendering is not really a big contributor but a red herring instead (i.e. the rendering slowness is only noticed on scrolling, but it's bad all the time)?
@romanofski, thanks for the details. We’ve certainly noticed the sluggish performance with approx. 130 items and with rendering as complex as that shown above. As you said, the sluggish performance can be achieved with far fewer items.
What we observed is that the slow performance is due to a combination of the complex formatting and the fact that the viewable part of the list is rerendered with every event (e.g. moving to the next/previous list item causes the viewable portion of the list to be rerendered). We addressed this problem by giving each list item a unique brick name, and using brick's caching infrastructure to cache all of the list items and invalidate only selected/previously selected list items on list item change. This avoids the rerendering of all viewable list items.
I've changed the little 'pidgets' app I wrote to try out new ideas around the layouting of purebred. In https://github.com/romanofski/pidgets/commit/c541b8b0a10907242866d2841b4fb300d0aeedfb it should now show the same symptoms of sluggish performance we've seen in purebred. A quick profile at least looks very very similar. Attached is the profiling result:
This can now be used to figure out where the problem is.
I'll give the idea @c-lewis mentioned a try and re-profile. I've looked up the rendering cache section in the handbook and it sounds like an idea to compare with.
I've filed an issue against brick in case there is more which can be done.
Thought github would create a link, but just for completeness: https://github.com/jtdaugherty/brick/issues/196
Hi, folks. Thanks for the report. To comment on a few things above:
- Using the rendering cache can indeed be important if the functions to produce your widgets are expensive to evaluate. Sometimes the cost is with Brick itself, but often the cost is in evaluating to a
Widget
in the first place, so caching the resulting Vty image in the rendering cache can be a good way to go. As with all caching approaches, invalidation becomes the tricky thing to manage. - As for the mention of
<+>
, you should definitely consider usinghBox
(orvBox
) instead of the binary operators any time you have more than two operands. The<+>
and<=>
are binary versions ofhBox
andvBox
respectively and result in a binary tree of renderings which can introduce more overhead than usinghBox
orvBox
with a list of more than two operands. - The
List
itself is designed to be efficient with respect to the number of elements in the list, in that it is guaranteed to only ever render at most enough elements to fill2*N
rows of space, whereN
is the total rows available in the list rendering area (i.e. a function of its surroundings). So this means that if a list in an area of height 20 and item height 1 has one million elements or one hundred elements, either way, only 40 elements ever get rendered. (I could go into why this is, but the main point is that the cost is capped regardless of the number of elements in the list.) This has been true since0.2
. Now, this doesn't mean that the per-item cost is necessarily low; it could still be high, and that's going to depend on your item rendering function cost profile as well as how you use the brick API.
To follow up on my note above, listDrawMail
and listDrawThread
could both benefit from using hBox
.
@romanofski pursuant to above comment, please merge 83a37f807b106d387e54b9a6a39da7ec0cd9d0be.
@frasertweedale nice!
I've had a try using Brick's cache as mentioned by @c-lewis. This makes scrolling very swift. The cost would be to invalidate the cache in several occasions in purebred's source. For example, tagging an item needs re-rendering to show the new tag.
I think @frasertweedale improvements are already significant and I'd hate to optimise too early at this stage. Sprinkling cache invalidation calls here and there is something I would not like to introduce now. I'd rather wait and think about a better architecture when we need it.
For reference, profiling using the caching for each list item:
Unless there is anything missing, I suggest we close this for now and focus on other things for 1.0. It's good enough for me.
@romanofski I think leave it open, push your changes to a branch, and then just move the ticket to a future milestone.
Fair enough. Branch pushed.
On the topic of the box combinator performance issue I mentioned earlier in this discussion, as of Brick 0.46
that is no longer an issue. You can use chains of <+>
and <=>
and they'll get optimized into calls to hBox
and vBox
automatically via https://github.com/jtdaugherty/brick/commit/dc0ae4627e23e11e8e76ad96a4c3bd456a814cfb.
@romanofski how do you find the thread list rendering performance these days? Is it still an issue or can we close this ticket?
I'd say it's fine. Unless we have something more substantial or realistic reference I think it's fine to close this.