Makie.jl
Makie.jl copied to clipboard
fix low hanging fruits for render performance
Description
using GLMakie
using BenchmarkTools
f, ax, pl = scatter(1:5);
sc = display(f)
@btime GLMakie.render_frame(sc)
Makie master
118.770 μs (1193 allocations: 30.41 KiB)
With sorting change
104.764 μs (729 allocations: 25.48 KiB)
With framebuffer_size optimization
85.648 μs (727 allocations: 25.45 KiB)
Compile Times benchmark
Note, that these numbers may fluctuate on the CI servers, so take them with a grain of salt. All benchmark results are based on the mean time and negative percent mean faster than the base branch. Note, that GLMakie + WGLMakie run on an emulated GPU, so the runtime benchmark is much slower. Results are from running:
using_time = @ctime using Backend
# Compile time
create_time = @ctime fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @ctime Makie.colorbuffer(display(fig))
# Runtime
create_time = @benchmark fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @benchmark Makie.colorbuffer(fig)
| using | create | display | create | display | |
|---|---|---|---|---|---|
| GLMakie | 5.08s (5.05, 5.16) 0.04+- | 109.63ms (108.44, 112.04) 1.51+- | 420.26ms (417.63, 426.82) 3.45+- | 9.45ms (9.32, 9.54) 0.08+- | 25.72ms (25.61, 26.09) 0.17+- |
| master | 5.07s (5.03, 5.14) 0.03+- | 109.24ms (107.98, 110.90) 1.04+- | 653.52ms (645.92, 661.46) 4.91+- | 8.32ms (8.25, 8.41) 0.06+- | 25.75ms (25.62, 26.19) 0.20+- |
| evaluation | 1.00x invariant, 0.01s (0.34d, 0.53p, 0.04std) | 1.00x invariant, 0.39ms (0.30d, 0.58p, 1.27std) | 1.56x faster✅, -233.26ms (-54.96d, 0.00p, 4.18std) | 0.88x slower❌, 1.13ms (15.87d, 0.00p, 0.07std) | 1.00x invariant, -0.03ms (-0.18d, 0.75p, 0.18std) |
| CairoMakie | 5.21s (5.12, 5.28) 0.06+- | 115.04ms (111.63, 119.05) 2.86+- | 172.68ms (167.11, 175.90) 3.85+- | 9.66ms (9.44, 10.15) 0.22+- | 1.29ms (1.25, 1.31) 0.02+- |
| master | 5.10s (5.03, 5.15) 0.04+- | 116.53ms (111.87, 119.73) 2.96+- | 175.60ms (170.60, 180.09) 3.62+- | 9.95ms (9.52, 10.45) 0.34+- | 1.23ms (1.17, 1.25) 0.03+- |
| evaluation | 0.98x slower X, 0.1s (1.97d, 0.00p, 0.05std) | 1.01x invariant, -1.49ms (-0.51d, 0.36p, 2.91std) | 1.02x invariant, -2.92ms (-0.78d, 0.17p, 3.73std) | 1.03x invariant, -0.28ms (-0.98d, 0.10p, 0.28std) | 0.95x slower❌, 0.07ms (2.61d, 0.00p, 0.03std) |
| WGLMakie | 5.69s (5.51, 5.85) 0.11+- | 117.58ms (109.92, 124.66) 5.82+- | 5.33s (4.87, 5.60) 0.28+- | 14.03ms (13.48, 14.72) 0.50+- | 137.67ms (131.05, 146.08) 5.25+- |
| master | 5.59s (5.49, 5.77) 0.09+- | 117.56ms (112.00, 127.43) 5.46+- | 5.72s (5.56, 5.98) 0.16+- | 13.25ms (12.62, 14.38) 0.59+- | 133.13ms (130.56, 136.30) 2.07+- |
| evaluation | 0.98x invariant, 0.1s (0.99d, 0.09p, 0.10std) | 1.00x invariant, 0.02ms (0.00d, 0.99p, 5.64std) | 1.07x faster✅, -0.39s (-1.73d, 0.01p, 0.22std) | 0.94x slower❌, 0.78ms (1.42d, 0.02p, 0.54std) | 0.97x invariant, 4.55ms (1.14d, 0.07p, 3.66std) |
This is failing because lines drop the model uniform (due it not being used) when linestyles are used
Benchmark Results
SHA: 96d592f9586ac63ff051fb7abc3db079f8db68a2
[!WARNING] These results are subject to substantial noise because GitHub's CI runs on shared machines that are not ideally suited for benchmarking.
Same benchmark code, different sorting options:
| Change/State | time [µs] | Allocations | Allocated KiB |
|---|---|---|---|
| merge master | 77.2µs | 727 | 25.45 |
| rollback sortby function | 90.8 | 1191 | 30.38 |
| typed sortby | 91.2 | 1191 | 30.38 |
inline transformationmatrix(plot) |
79.6 | 758 | 29.81 |
inline plot.model[] |
80.7 | 696 | 21.09 |
using zvalue2d with transformationmatrix() |
84.6 | 820 | 23.03 |
Calling zvalue2d seems to have a significant overhead compared to calling what it does directly. Perhaps due to runtime dispatch? plot.model[] is also slower than transformationmatrix(), maybe due to type stability. I went with calling transformationmatrix() for now, which is pretty close to the original optimization.
Moving around some clip planes code to make setup_clip_planes() type stable got me 53.3µs, 272 allocation, 18KiB. Waiting on CI before I push that.