wgpu-py
wgpu-py copied to clipboard
Benchmarking the performance of `JupyterWgpuCanvas`
I went down a rabbit hole of testing out https://github.com/vispy/jupyter_rfb/pull/76 , and figuring out why the JupyterWgpuCanvas
could never seem to exceed 30fps. simplejpeg
is very fast, encoding is in the range of a few milliseconds, or less than 1 ms for encoding astronaut.png
. Increasing widget.max_buffered_frames
worked for basic RemoteFrameBuffer
, but made no difference with the WgpuCanvas
. So I found this:
https://github.com/pygfx/wgpu-py/blob/2ffacd933d60423537826919c3ac6a950188919c/wgpu/gui/jupyter.py#L86-L89
Dividing the draw_wait_time
on L89 by 2 seems to increase performance! The framerate is double with small canvases around 512x512, and significantly higher for larger canvases.
I added a JupyterWgpuCanvas.delay_divisor
which divides the draw_wait_time
in the call_later
call to test things:
Ran these on a Radeon RX 570 (old GPU), will test on a more modern GPU later today.
This gives me 54fps:
import fastplotlib as fpl
import numpy as np
a = np.random.rand(100, 100)
plot = fpl.Plot(size=(500, 500), name=f"fps: {0}")
plot.add_image(a)
i = 0
buffer_size = range(2, 200)
def update_frame(p):
p.graphics[0].data = np.random.rand(100, 100)
fps = p.canvas.get_stats()["fps"]
plot.set_title(f"fps: {fps:.01f}")
plot.add_animations(update_frame)
# 30 fps without these two lines
plot.canvas.max_buffered_frames = 20
plot.canvas.delay_divisor = 2
plot.show(sidecar=False)
With delay_divisor = 2
, the lag is barely perceptible during interaction, if it's increased to 4 the lag becomes very noticeable and the framerate barely increases. The pygfx
controller also has dampening which probably helps to reduce effects of lag (if present) with delay_divisor = 2
.
delay_divisor = 1:
https://github.com/pygfx/wgpu-py/assets/9403332/97e576d5-de27-42d9-ab6d-de2eb5b36dd0
delay_divisor = 2:
https://github.com/pygfx/wgpu-py/assets/9403332/00a856a1-ce2f-49f6-84fe-0b6384e0f660
delay_divisor = 4, lag is very obvious:
https://github.com/pygfx/wgpu-py/assets/9403332/11601334-a07e-4fe3-8a29-95916c214e25
I benchmarked this with a larger canvas at 1700x900 and got this, which seems to suggest that dividing the delay by 2 and buffering 10-20 frames gives the best performance. If there's a way to measure input lag that would be nice to factor in as well!
Benchmarking code:
from itertools import product
import random
import fastplotlib as fpl
import numpy as np
import pandas as pd
import seaborn as sns
delays = range(1, 10)
max_buffered = range(2, 100, 5)
test_grid = list(product(delays, max_buffered))
results = pd.DataFrame(index=delays, columns=max_buffered)
plot = fpl.Plot(size=(1700, 900))
# pre-make images
img = np.random.rand(900, 1700)
plot.add_image(img)
i = 0
plot.canvas.delay_divisor = test_grid[i][0]
plot.canvas.max_buffered_frames = test_grid[i][1]
def update_frame(p):
global i
stats = p.canvas.get_stats()
fps = stats["fps"]
if stats["sent_frames"] > 500:
if i == len(test_grid):
plot.set_title("done!")
return
# record fps
results.loc[test_grid[i]] = fps
p.canvas.reset_stats()
plot.canvas.delay_divisor = test_grid[i][0]
plot.canvas.max_buffered_frames = test_grid[i][1]
i += 1
plot.set_title(f"fps: {fps:.01f}")
plot.add_animations(update_frame)
plot.show(sidecar=False)
ax = sns.heatmap(results.astype(float).round(), cmap="viridis", cbar_kws={"label": "fps"})
ax.set_ylabel("delay divisor")
ax.set_xlabel("max buffered frames")
Side note: the drop in fps with ~7 buffered frames is really odd.
Questions before doing a PR:
- Is the
JupyterWgpuCanvas._get_draw_wait_time()
from a round-trip measure, which is why dividing it by 2 increases the performance? - Do you think it would be a good idea to add
delay_divisor
as a@property
toJupyterWgpuCanvas
? Or is there some other caveat? The only drawback I found is input lag if thedelay_divisor
is large, but the benchmarks seem to show that increasing it beyond 2 doesn't increase performance anyways. - I still need to test this without https://github.com/vispy/jupyter_rfb/pull/76 to see if it's rate limiting.
Now this makes me wonder what is the real bottleneck when the canvas is very large, like near 4k. Simplejpeg starts to slow down at these resolutions, might look into nvjpeg.