purebred Performance: scrolling through a long list of threads is slow

When a notmuch search results in a lot of items, scrolling down or up by keeping the key pressed uses a lot of CPU and is visually slow.

How to reproduce:

Use a search term which results in a lot of list items
keep the key pressed to "scroll" down or up

Attached is a the output of a profile run: purebred_profiled.txt

Oct 15 '18 05:10 romanofski

Most of the cost is in brick itself. We may have to do some upstream optimisation work in brick to improve things.

Oct 15 '18 06:10 frasertweedale

@romanofski because the cost is almost entirely in brick, could you come up with a minimal test program that reproducibly puts brick through its paces (ideally without any user input) and results in a similar cost table? Then we can use that as the basis for optimising brick without anything purebred-related getting in the way.

Oct 17 '18 05:10 frasertweedale

Yeah that's actually a great idea, to write a smaller reproducer. Will do...

Oct 17 '18 08:10 romanofski

I'm having similar performance issues with a brick-based application I have written. When you say, "a lot of list items", about how many items/rows are you talking about?

Oct 30 '18 17:10 c-lewis

@c-lewis I think I observed it with even 50 items, although I would have to go back to verify. Definitely go with 500 items and you should see a sluggish performance.

From a first glance we render a lot of different fields like:

listDrawMail :: AppState -> Bool -> NotmuchMail -> Widget Name
listDrawMail s sel a =
    let settings = view (asConfig . confNotmuch) s
        isNewMail = hasTag (view nmNewTag settings) a
        widget = padLeft (Pad 1) (txt $ formatDate (view mailDate a)) <+>
                 padLeft (Pad 1) (renderAuthors sel $ view mailFrom a) <+>
                 padLeft (Pad 1) (renderTagsWidget (view mailTags a) (view nmNewTag settings)) <+>
                 padLeft (Pad 1) (txt (view mailSubject a)) <+> fillLine
in withAttr (getListAttr isNewMail sel) widget

from https://github.com/purebred-mua/purebred/blob/master/src/UI/Index/Main.hs and I think it's the combination with many items and the combinators <+>. Again would have to verify, just a hypothesis.

Oct 30 '18 22:10 romanofski

@romanofski this agrees with my observation that a lot of the cost is in the functions that compute where to paint / how much space is available for different widgets.

Oct 31 '18 01:10 frasertweedale

@romanofski here's a commit. Could you please do some profiling before/after this commit and publish the output? https://github.com/purebred-mua/purebred/commit/83a37f807b106d387e54b9a6a39da7ec0cd9d0be

Oct 31 '18 10:10 frasertweedale

Hm... nope. Procedure:

$ git rev-parse HEAD
83a37f807b106d387e54b9a6a39da7ec0cd9d0be
$ stack clean
$ stack build --profile
$ stack exec --rts-options "-P" purebred

New profile log: purebred_profiled.txt which is not so much different then the old log from the first glimpse. Wondering if I stuffed something up o.O

Nov 01 '18 09:11 romanofski

Before I forget, the amount of items displayed were a meagre: 47

Nov 01 '18 10:11 romanofski

Hm, ok. Needs a more controlled test to better measure if there's any tangible improvement. This technique definitely made a difference in a small test app for me, but maybe it is overwhelmed by other factors or the more complex UI, and the list rendering is not really a big contributor but a red herring instead (i.e. the rendering slowness is only noticed on scrolling, but it's bad all the time)?

Nov 01 '18 10:11 frasertweedale

@romanofski, thanks for the details. We’ve certainly noticed the sluggish performance with approx. 130 items and with rendering as complex as that shown above. As you said, the sluggish performance can be achieved with far fewer items.

What we observed is that the slow performance is due to a combination of the complex formatting and the fact that the viewable part of the list is rerendered with every event (e.g. moving to the next/previous list item causes the viewable portion of the list to be rerendered). We addressed this problem by giving each list item a unique brick name, and using brick's caching infrastructure to cache all of the list items and invalidate only selected/previously selected list items on list item change. This avoids the rerendering of all viewable list items.

Nov 01 '18 13:11 c-lewis

I've changed the little 'pidgets' app I wrote to try out new ideas around the layouting of purebred. In https://github.com/romanofski/pidgets/commit/c541b8b0a10907242866d2841b4fb300d0aeedfb it should now show the same symptoms of sluggish performance we've seen in purebred. A quick profile at least looks very very similar. Attached is the profiling result:

pidgets.prof.txt

This can now be used to figure out where the problem is.

Nov 03 '18 23:11 romanofski

I'll give the idea @c-lewis mentioned a try and re-profile. I've looked up the rendering cache section in the handbook and it sounds like an idea to compare with.

I've filed an issue against brick in case there is more which can be done.

Nov 15 '18 10:11 romanofski

Thought github would create a link, but just for completeness: https://github.com/jtdaugherty/brick/issues/196

Nov 15 '18 10:11 romanofski

Hi, folks. Thanks for the report. To comment on a few things above:

Using the rendering cache can indeed be important if the functions to produce your widgets are expensive to evaluate. Sometimes the cost is with Brick itself, but often the cost is in evaluating to a Widget in the first place, so caching the resulting Vty image in the rendering cache can be a good way to go. As with all caching approaches, invalidation becomes the tricky thing to manage.
As for the mention of <+>, you should definitely consider using hBox (or vBox) instead of the binary operators any time you have more than two operands. The <+> and <=> are binary versions of hBox and vBox respectively and result in a binary tree of renderings which can introduce more overhead than using hBox or vBox with a list of more than two operands.
The List itself is designed to be efficient with respect to the number of elements in the list, in that it is guaranteed to only ever render at most enough elements to fill 2*N rows of space, where N is the total rows available in the list rendering area (i.e. a function of its surroundings). So this means that if a list in an area of height 20 and item height 1 has one million elements or one hundred elements, either way, only 40 elements ever get rendered. (I could go into why this is, but the main point is that the cost is capped regardless of the number of elements in the list.) This has been true since 0.2. Now, this doesn't mean that the per-item cost is necessarily low; it could still be high, and that's going to depend on your item rendering function cost profile as well as how you use the brick API.

Nov 15 '18 17:11 jtdaugherty

To follow up on my note above, listDrawMail and listDrawThread could both benefit from using hBox.

Nov 15 '18 17:11 jtdaugherty

@romanofski pursuant to above comment, please merge 83a37f807b106d387e54b9a6a39da7ec0cd9d0be.

Nov 16 '18 05:11 frasertweedale

@frasertweedale nice!

Nov 17 '18 02:11 jtdaugherty

I've had a try using Brick's cache as mentioned by @c-lewis. This makes scrolling very swift. The cost would be to invalidate the cache in several occasions in purebred's source. For example, tagging an item needs re-rendering to show the new tag.

I think @frasertweedale improvements are already significant and I'd hate to optimise too early at this stage. Sprinkling cache invalidation calls here and there is something I would not like to introduce now. I'd rather wait and think about a better architecture when we need it.

For reference, profiling using the caching for each list item:

purebred_with_caching.txt

Unless there is anything missing, I suggest we close this for now and focus on other things for 1.0. It's good enough for me.

Nov 17 '18 04:11 romanofski

@romanofski I think leave it open, push your changes to a branch, and then just move the ticket to a future milestone.

Nov 17 '18 23:11 frasertweedale

Fair enough. Branch pushed.

Nov 18 '18 05:11 romanofski

On the topic of the box combinator performance issue I mentioned earlier in this discussion, as of Brick 0.46 that is no longer an issue. You can use chains of <+> and <=> and they'll get optimized into calls to hBox and vBox automatically via https://github.com/jtdaugherty/brick/commit/dc0ae4627e23e11e8e76ad96a4c3bd456a814cfb.

Dec 27 '18 22:12 jtdaugherty

@romanofski how do you find the thread list rendering performance these days? Is it still an issue or can we close this ticket?

Sep 29 '22 03:09 frasertweedale

I'd say it's fine. Unless we have something more substantial or realistic reference I think it's fine to close this.

Oct 03 '22 01:10 romanofski

purebred purebred copied to clipboard

Performance: scrolling through a long list of threads is slow

purebred
purebred copied to clipboard