Performance Optimization and Vulkan Consideration for Blur Effect
Hey Wayfire devs,
so I’ve been looking at how the blur works in Wayfire, especially wf_gaussian_blur and the Kawase implementation. The shaders are clean and the visuals are great, the Gaussian blur shader uses five weighted samples per axis in horizontal and vertical passes, which is pretty standard (gaussian_fragment_shader_horz/_vert). But performance can get heavy, especially with bigger blur radii or high-res outputs, because blur_fb0() and render_iteration() do multiple framebuffer reads/writes. Some OpenGL tweaks could help, like reducing iterations when the blur radius is small, precomputing offsets/weights, or using half-float textures to cut memory bandwidth.
Honestly, if blur ran on Vulkan, it could be way faster. You could turn render_iteration() into a compute shader: use local workgroups, shared memory for row/column data, and preloaded kernel weights. That way, you keep the exact same visuals but leverage GPU parallelism fully. The damage region and framebuffer logic (prepare_blur(), copy_region()) can stay the same at first, so the visual behavior doesn’t change. You could even do this incrementally: first OpenGL optimizations, then a Vulkan compute shader pass. Overall, it’s about keeping the visuals intact while making the code way more efficient, especially for higher resolutions or heavier blur settings.
I agree. Patches and PRs welcome. 👍
@saberr26 I am not sure where you see compute shaders helping. We still need the previous pass to complete before starting the next one, and for such simple calculations as what we have compute shaders are not faster than fragment shaders. If you know some other algorithms, feel free to let me know, or send a patch :)
That aside, we can use compute shaders with opengl too, so that's not a huge reason to port to vulkan.
Lastly, you can tune blur in different ways. Kawase is the cheapest algorithm by far, and you can tune the degrade option as well (higher == more blur and better perf, but at some point quality degrades too much, as you need to increase iterations too...)
thanks for the explanation! I get now why compute shaders wouldn’t really help for the current Gaussian blur, since each pass depends on the previous one. Makes sense. About other algorithms, I was thinking maybe something like a dual Kawase blur,basically two lower-radius Kawase passes instead of one big one. Could give almost the same visual quality but with fewer framebuffer reads/writes. You could still tune it with degrade and iterations like you do now to balance quality and speed. Also, some small OpenGL tweaks might help a bit, like precomputing offsets/weights, using half-float textures to save memory bandwidth, or skipping extra iterations when the blur radius is small. Nothing that changes how it looks, but could improve performance on high-res outputs.