Makie.jl icon indicating copy to clipboard operation
Makie.jl copied to clipboard

fix low hanging fruits for render performance

Open SimonDanisch opened this issue 1 year ago • 2 comments

Description

using GLMakie
using BenchmarkTools
f, ax, pl = scatter(1:5);
sc = display(f)
@btime GLMakie.render_frame(sc)

Makie master

118.770 μs (1193 allocations: 30.41 KiB)

With sorting change

104.764 μs (729 allocations: 25.48 KiB)

With framebuffer_size optimization

85.648 μs (727 allocations: 25.45 KiB)

SimonDanisch avatar Oct 15 '24 20:10 SimonDanisch

Compile Times benchmark

Note, that these numbers may fluctuate on the CI servers, so take them with a grain of salt. All benchmark results are based on the mean time and negative percent mean faster than the base branch. Note, that GLMakie + WGLMakie run on an emulated GPU, so the runtime benchmark is much slower. Results are from running:

using_time = @ctime using Backend
# Compile time
create_time = @ctime fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @ctime Makie.colorbuffer(display(fig))
# Runtime
create_time = @benchmark fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @benchmark Makie.colorbuffer(fig)
using create display create display
GLMakie 5.08s (5.05, 5.16) 0.04+- 109.63ms (108.44, 112.04) 1.51+- 420.26ms (417.63, 426.82) 3.45+- 9.45ms (9.32, 9.54) 0.08+- 25.72ms (25.61, 26.09) 0.17+-
master 5.07s (5.03, 5.14) 0.03+- 109.24ms (107.98, 110.90) 1.04+- 653.52ms (645.92, 661.46) 4.91+- 8.32ms (8.25, 8.41) 0.06+- 25.75ms (25.62, 26.19) 0.20+-
evaluation 1.00x invariant, 0.01s (0.34d, 0.53p, 0.04std) 1.00x invariant, 0.39ms (0.30d, 0.58p, 1.27std) 1.56x faster✅, -233.26ms (-54.96d, 0.00p, 4.18std) 0.88x slower❌, 1.13ms (15.87d, 0.00p, 0.07std) 1.00x invariant, -0.03ms (-0.18d, 0.75p, 0.18std)
CairoMakie 5.21s (5.12, 5.28) 0.06+- 115.04ms (111.63, 119.05) 2.86+- 172.68ms (167.11, 175.90) 3.85+- 9.66ms (9.44, 10.15) 0.22+- 1.29ms (1.25, 1.31) 0.02+-
master 5.10s (5.03, 5.15) 0.04+- 116.53ms (111.87, 119.73) 2.96+- 175.60ms (170.60, 180.09) 3.62+- 9.95ms (9.52, 10.45) 0.34+- 1.23ms (1.17, 1.25) 0.03+-
evaluation 0.98x slower X, 0.1s (1.97d, 0.00p, 0.05std) 1.01x invariant, -1.49ms (-0.51d, 0.36p, 2.91std) 1.02x invariant, -2.92ms (-0.78d, 0.17p, 3.73std) 1.03x invariant, -0.28ms (-0.98d, 0.10p, 0.28std) 0.95x slower❌, 0.07ms (2.61d, 0.00p, 0.03std)
WGLMakie 5.69s (5.51, 5.85) 0.11+- 117.58ms (109.92, 124.66) 5.82+- 5.33s (4.87, 5.60) 0.28+- 14.03ms (13.48, 14.72) 0.50+- 137.67ms (131.05, 146.08) 5.25+-
master 5.59s (5.49, 5.77) 0.09+- 117.56ms (112.00, 127.43) 5.46+- 5.72s (5.56, 5.98) 0.16+- 13.25ms (12.62, 14.38) 0.59+- 133.13ms (130.56, 136.30) 2.07+-
evaluation 0.98x invariant, 0.1s (0.99d, 0.09p, 0.10std) 1.00x invariant, 0.02ms (0.00d, 0.99p, 5.64std) 1.07x faster✅, -0.39s (-1.73d, 0.01p, 0.22std) 0.94x slower❌, 0.78ms (1.42d, 0.02p, 0.54std) 0.97x invariant, 4.55ms (1.14d, 0.07p, 3.66std)

MakieBot avatar Oct 15 '24 21:10 MakieBot

This is failing because lines drop the model uniform (due it not being used) when linestyles are used

ffreyer avatar Oct 16 '24 22:10 ffreyer

Benchmark Results

SHA: 96d592f9586ac63ff051fb7abc3db079f8db68a2

[!WARNING] These results are subject to substantial noise because GitHub's CI runs on shared machines that are not ideally suited for benchmarking.

GLMakie CairoMakie WGLMakie

MakieBot avatar Oct 29 '24 14:10 MakieBot

Same benchmark code, different sorting options:

Change/State time [µs] Allocations Allocated KiB
merge master 77.2µs 727 25.45
rollback sortby function 90.8 1191 30.38
typed sortby 91.2 1191 30.38
inline transformationmatrix(plot) 79.6 758 29.81
inline plot.model[] 80.7 696 21.09
using zvalue2d with transformationmatrix() 84.6 820 23.03

Calling zvalue2d seems to have a significant overhead compared to calling what it does directly. Perhaps due to runtime dispatch? plot.model[] is also slower than transformationmatrix(), maybe due to type stability. I went with calling transformationmatrix() for now, which is pretty close to the original optimization.

Moving around some clip planes code to make setup_clip_planes() type stable got me 53.3µs, 272 allocation, 18KiB. Waiting on CI before I push that.

ffreyer avatar Nov 04 '24 20:11 ffreyer