engine [Impeller] absorb the last blur pass into an entity instead of baking it in a render pass

fixes https://github.com/flutter/flutter/issues/150720

Pre-launch Checklist

[ ] I read the Contributor Guide and followed the process outlined there for submitting PRs.
[ ] I read the Tree Hygiene wiki page, which explains my responsibilities.
[ ] I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
[ ] I listed at least one issue that this PR fixes in the description above.
[ ] I added new tests to check the change I am making or feature I am adding, or the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
[ ] I updated/added relevant documentation (doc comments with ///).
[ ] I signed the CLA.
[ ] All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

Jun 27 '24 20:06 gaaclarke

I have it mostly implemented with a bug in the blur step size I need to fix. I think the performance will be representative though, here's where we are (a 25% increase in fps):

Jun 27 '24 21:06 gaaclarke

In the current state a non-uniform blur in just the x direction looks shimmery in GaussianBlurAnimatedBackdrop. I checked against ToT and it's the case there too.

It looks like we need to scale the pixel size by 0.5 to get the correct pixel size when rendering inline. In the backdrop filter case that seems accessible in the effect_transform. I'm not seeing an easy way to derive that in the image filter case. We might have to pipe the dpi scalar to here or find a way to do the blur pass that is more amenable to the math we have available to us.

Jun 28 '24 18:06 gaaclarke

Is the intention of this to basically render the second directional blur pass directly into the parent layer RenderPass, thereby rendering it at the full size of the blur?

Idea being that the extra volume of fragment computations is hopefully cheaper than the cost of rendering to a smaller texture first?

Jul 01 '24 17:07 bdero

Is the intention of this to basically render the second directional blur pass directly into the parent layer RenderPass, thereby rendering it at the full size of the blur?

Idea being that the extra volume of fragment computations is hopefully cheaper than the cost of rendering to a smaller texture first?

Nope, the idea is just to eliminate a render pass which carries a lot of overhead. This is registering about 20% faster in the backdrop filter blur benchmark.

Jul 01 '24 17:07 gaaclarke

Oh, is the render pass that's being eliminated not what the second blur kernel pass writes to?

Jul 01 '24 17:07 bdero

Oh, is the render pass that's being eliminated not what the second blur kernel pass writes to?

The thing that the second blur pass was rendering to was typically the same thing the downsample pass wrote to, so it's not eliminated.

Jul 01 '24 17:07 gaaclarke

The following patch seems to render things correctish. Here is the render versus the golden before the change. I have no idea where this 2.0 factor is coming from though.

--- a/impeller/entity/contents/filters/gaussian_blur_filter_contents.cc
+++ b/impeller/entity/contents/filters/gaussian_blur_filter_contents.cc
@@ -751,11 +751,9 @@ std::optional<Entity> GaussianBlurFilterContents::RenderFilter(
         pass2_out.value().GetRenderTargetTexture();
     blur_output_entity.SetBlendMode(entity.GetBlendMode());
     blur_output_entity.SetTransform(blur_output_entity_transform);
-    constexpr Vector2 orthographic_size(2.0f, 2.0f);
-    Vector2 inline_pixel_size = pass1_pixel_size / orthographic_size;
     blur_output_entity.SetContents(Contents::MakeAnonymous(
         /*render_proc=*/
-        [sampler_desc, texture, inline_pixel_size, blur_info,
+        [sampler_desc, texture, pass1_pixel_size, blur_info,
          downsample_pass_args](const ContentContext& renderer,
                                const Entity& entity, RenderPass& pass) {
           Quad blur_uvs = {Point(0, 0), Point(1, 0), Point(0, 1), Point(1, 1)};
@@ -763,13 +761,16 @@ std::optional<Entity> GaussianBlurFilterContents::RenderFilter(
               Point(0, 0), Point(texture->GetSize().width, 0),
               Point(0, texture->GetSize().height),
               Point(texture->GetSize().width, texture->GetSize().height)};
+          int32_t fudge_scalar = 2.0;
           return Render1DBlur(
               renderer, pass, blur_vertices, blur_uvs, sampler_desc, texture,
               BlurParameters{
-                  .blur_uv_offset = Point(inline_pixel_size.x, 0.0),
-                  .blur_sigma = blur_info.scaled_sigma.x *
+                  .blur_uv_offset =
+                      Point(pass1_pixel_size.x / fudge_scalar, 0.0),
+                  .blur_sigma = fudge_scalar * blur_info.scaled_sigma.x *
                                 downsample_pass_args.effective_scalar.x,
                   .blur_radius =
+                      fudge_scalar *
                       ScaleBlurRadius(blur_info.blur_radius.x,
                                       downsample_pass_args.effective_scalar.x),
                   .step_size = 1,

Jul 01 '24 18:07 gaaclarke

I chatted with @bdero. The fudge factor is the downscale amount. I wasn't thinking. The inline blur pass will now have to be scaled to match the destination size we are rendering to. That makes me a bit more dubious that this will be beneficial except for the cases of small blur sigmas.

Jul 01 '24 19:07 gaaclarke

That fudge scalar isn't directly the downsample scalar. I tried that and it doesn't work. It makes sense we have to blur more when drawing directly in the parent pass since it will be at a higher resolution. I ran the golden tests and the 2.0 renders things correctly. Notice here how the blur radius is mostly the same (this is despite the downscale factor not being 1/2):

However doing the horizontal blur pass in the destination breaks the work we did to make rotated uniform blurs work: Screenshot 2024-07-01 at 1 58 26 PM

Jul 01 '24 21:07 gaaclarke

Here are the measurements before and after with the 2.0 fudge factor that seems to render things correctly:

before	after

Given the fact that we have to scale the kernel and we'd only be able to use this for unrotated blurs, the window of opportunities where this could potentially would help are shrinking.

Jul 01 '24 21:07 gaaclarke