core icon indicating copy to clipboard operation
core copied to clipboard

Implement hardware accelerated rendering with WebGPU/Vello

Open kkoreilly opened this issue 1 year ago • 32 comments

Describe the feature

Currently, scrolling on web is very laggy and choppy. We need to improve this by profiling and improving the performance.

Relevant code

No response

kkoreilly avatar Jul 25 '24 23:07 kkoreilly

Here is some profiling data from scrolling on our initial release blog post in Chrome on macOS:

  • 65.3% of time is spent on widget rendering (not involving system rendering, layout, styles, events, etc)
  • 33.8% on image/draw.Draw (11.0% from core.Image.Render, 10.9% from paint.Context.DrawBox, 7.9% from paint.Text.Render, and 3.6% from system.DrawerBase.Copy)
  • 28.7% on paint.Context.DrawStandardBox (15.8% from core.Frame.Render, 2.7% from core.Text.Render)
  • 22.2% on core.Scene.layoutScene
  • 21.3% on core.Image.Render
  • 16.3% on paint.Text.LayoutStdLR (11.0% from core.Text.configTextSize and 5.0% from core.Text.configTextAlloc)
  • 16.2% on core.Text.RenderWidget (15.8% on core.Text.Render)
  • 14.3% on paint.Text.Render (13.0% from core.Text.Render)
  • 14.1% on raster.Path.AddTo (11.1% from paint.Context.StrokePreserve and 2.9% from paint.Context.FillPreserve)

Overall, this paints a very clear picture: the biggest bottleneck when scrolling a blog post on web is rendering, with text and images being major bottlenecks. Standard box rendering, mainly from frames and other assorted widgets, also contributes a lot.

kkoreilly avatar Jul 25 '24 23:07 kkoreilly

it should not be calling layoutScene or LayoutStdLR during scrolling. are you sure you limited the profiling to after the initial render and just when scrolling? otherwise, it is just a lot of rendering, which is kind of inevitable.. you can also get this data on native mac -- likely the same right?

rcoreilly avatar Jul 26 '24 00:07 rcoreilly

By use of print statements, I can definitively affirm that LayoutStdLR is happening during scrolling.

Different things may be more of a bottleneck on web versus other platforms; it is also good to confirm that the actual web rendering is only around 4% of the total time.

kkoreilly avatar Jul 26 '24 00:07 kkoreilly

An actual WidgetBase.NeedsLayout is not happening however.

kkoreilly avatar Jul 26 '24 00:07 kkoreilly

that would take 0% time and likely doesn't show up. it really shouldn't be doing layoutscene otherwise.

rcoreilly avatar Jul 26 '24 00:07 rcoreilly

using Ctrl+Alt+R (which didn't actually work -- had to pull up settings and use the menu):

core.(*Scene).doUpdate-render                               Total: 3011.80 ms	Avg:  7.97	N:   378	Pct: 99.46
core.(*Scene).doUpdate-restyle                              Total:   11.49 ms	Avg: 11.49	N:     1	Pct:  0.38
core.(*Scene).doUpdate-layout                               Total:    4.67 ms	Avg:  4.67	N:     1	Pct:  0.15
tree.(*Plan).Update-plan.Update                             Total:    0.22 ms	Avg:  0.01	N:    24	Pct:  0.01

so one time somebody called restyle, but the rest of the time was rendering.

rcoreilly avatar Jul 26 '24 00:07 rcoreilly

I am stating that I added a print statement to NeedsLayout, which is not being activated. The same is true for layoutScene, so I will redo my testing. LayoutStdLR was definitely happening though, but that is probably mostly from plots and text editors.

kkoreilly avatar Jul 26 '24 00:07 kkoreilly

We decided on putting profile in the settings menu (you can also press Ctrl+Alt+R in the settings window), but I am fine moving it to the main menu instead.

kkoreilly avatar Jul 26 '24 00:07 kkoreilly

that restyle happens when a tooltip is activated.

rcoreilly avatar Jul 26 '24 00:07 rcoreilly

good point about the plots and text editors wrt LayoutStdLR

rcoreilly avatar Jul 26 '24 00:07 rcoreilly

I redid my web profiling and there was no more layoutScene, but there was still LayoutStdLR, with almost all of it coming from core.Text.

kkoreilly avatar Jul 26 '24 00:07 kkoreilly

u gotta figure out where that is coming from! very bad. core.Text Render does not call it.

rcoreilly avatar Jul 26 '24 00:07 rcoreilly

I set breakpoints in places where core.Text calls paintText.Layout and it does not hit that during standard mac scrolling of blog page.

rcoreilly avatar Jul 26 '24 01:07 rcoreilly

This is very troubling: it is decisively not calling configTextSize during scrolling on native macOS but it is calling it all the time while scrolling on web.

kkoreilly avatar Jul 26 '24 01:07 kkoreilly

Somehow it keeps getting the Scene.needsLayout flag.

kkoreilly avatar Jul 26 '24 01:07 kkoreilly

The text editors are causing it to repeatedly call layoutScene but only on web!

kkoreilly avatar Jul 26 '24 01:07 kkoreilly

The toolbar overflow menu is triggering the layout calls!

kkoreilly avatar Jul 26 '24 01:07 kkoreilly

It is not web-specific; on any platform, if there are items moved to the overflow menu, it calls NeedsLayout constantly when you go by a text editor with a scrollbar!

kkoreilly avatar Jul 26 '24 01:07 kkoreilly

This is much better with #1059, although it still gets tripped up with the image resizing the first time you pass the image.

kkoreilly avatar Jul 26 '24 02:07 kkoreilly

With #1060, I no longer have any noticeable lag on my phone on web.

kkoreilly avatar Jul 26 '24 05:07 kkoreilly

The scrolling on macOS web is still notably less smooth than macOS native.

kkoreilly avatar Jul 26 '24 05:07 kkoreilly

Note that although the text editor basic rendering does not cause significant lagging on mobile web anymore, if you actually enter the text editor and select things and exit it, it frequently crashes.

kkoreilly avatar Jul 26 '24 19:07 kkoreilly

The plan at this point is to switch to WebGPU instead of Vulkan, so we can do accelerated rendering on all platforms #507 and then see about either directly wrapping https://github.com/linebender/vello or perhaps making our own go-based WebGPU wrappers around their .wgsl rasterizing shaders (ideal), so that we can have hardware-accelerated rendering on all platforms. Currently we are using a CPU-based rasterizer https://github.com/srwiley/rasterx which is pretty impressive for CPU, but when going through the Go -> WASM translation, it suffers considerably.

rcoreilly avatar Jul 26 '24 19:07 rcoreilly

Slowing down the scroll speed can hide the problem. And maybe solve it. I think.

baxiry avatar Jul 29 '24 18:07 baxiry

@baxiry the standard behavior is to match the speed of your finger, which we recently accomplished. having it be slower than your finger moves would probably be disconcerting.

rcoreilly avatar Jul 29 '24 22:07 rcoreilly

Yes, the speed should be more natural now, which should also mean less choppiness. @rcoreilly, what would you think about being more aggressive in filtering scroll events on web so that it has to update less often? That would make it maybe slightly less smooth, but if it prevents the system from being overwhelmed, it might end up looking better. Also we could consider lowering the FPS of our new continued scroll velocity feature (it is currently 60 but we could try 30 etc). Regardless, I think we can consider these changes after we implement WebGPU and GopherJS, which should make the problem a lot less bad.

kkoreilly avatar Jul 29 '24 22:07 kkoreilly

yeah smoothness = more frequent updates, not less.. and the filtering is entirely a function of how slow the update is -- it doesn't have any parameters. would be worth trying the 30 fps on the momentum scroll, just to see.

rcoreilly avatar Jul 29 '24 22:07 rcoreilly

I can understand that a single code-base for any platform is a very desirable thing and this is a more valiant and impressive attempt than all previous attempts I've seen. But fighting against native browser performance, which has been tuned and optimised for decades might be the wrong approach.

One way forward might be to auto-convert cogent code into a plain HTML/CSS/JS manifest.

nkev avatar Jul 30 '24 21:07 nkev

@nkev, I definitely understand where you are coming from with that. We are already planning to implement an HTML preview while pages are loading, which will also improve SEO (see https://github.com/cogentcore/core/issues/702#issuecomment-2253756737). We hope that new feature in combination with other things such as GopherJS (#974) and WebGPU (#507) will make the performance viable on web. If that does not succeed in doing so, we will be open to considering other options.

kkoreilly avatar Jul 30 '24 22:07 kkoreilly

The ongoing #1457 PR should hopefully allow for major web performance improvements, including by directly sending path rendering commands to the native HTML canvas renderer, allowing for optimized GPU acceleration. It should also create the necessary infrastructure for a future implementation of GPU-accelerated rendering on all platforms, in addition to overall rendering optimizations that should make CPU faster as well. It may also address #568.

After that is merged, the setTimeout "violation"/warning in the Chrome dev tools should go away, as the performance should be better, but we can check that.

kkoreilly avatar Jan 29 '25 06:01 kkoreilly