piker
piker copied to clipboard
Chart graphics with drawing speedups
Had a great chat with a collaborator discussing some options for squeezing performance out of pyqtgraph.
A lot was confirming ideas I had previous but a few nuggets are going to be particularly fun to experiment with.
Also note, there is no real latency problem yet but this is anticipation of updating multiple assets per plot.
This will all be more relevant once the first draft of #10 lands :surfing_man: (soooon).
As part of that landing will come a naive charting update system using some well known slowly-ness like np.append()/np.concatenate() (which frankly aren't anywhere near being a bottleneck yet) but which we can of course improve upon.
The tip(s) received for improving rendering latency and general datum compacting include:
- charts are resized much less frequently than interactions within them - allocate a
numpyarray 4x the size, and usenumbato do the subsetting (aka downsampling in graphics land) - the general strategy for said downsampling (if drawing tick-like, event triggered lines) is something like:
only 4 points per pixel you bin by the width of the chart, and put the entry val, min, max, exit in that bin then it will be as accurate looking as the full data
- this idea comes from 2 papers related to a data viz downsampling algorithm:
- M4: A Visualization-Oriented Time Series Data Aggregation
- Faster Visual Analytics through Pixel-Perfect Aggregation
-
look at fig 3 for why you need 4 points otherwise you get discontinuities as you zoom the idea is to have the rendering look the same for the sampled version as if you threw all the data in the range (without having the chart library try to do something smart, just render all data using a standard line drawing algo) anyway that's only if you care about pixel width blips ;)
-
- this idea comes from 2 papers related to a data viz downsampling algorithm:
pyqtgraphhas some similar implementations for lines in the code base that could be potentially improved with this:- the
clipToViewalgo code - the
downSampleMethod == 'peak'code-
the peak method is closest but doesn't account for the entry and exit points of the bin and it's doing two passes to compute min/max, one thing is that since they can only use numpy they are doing things a bit sub-optimally with multiple passes instead of one
-
- the
-
also there's a cache friendly version of binary search called galloping search (or exponential search), instead of jumping back and forth in the array you only go in one direction until the last bin, works better for large arrays (the final bin is done with binary search)
-
you can also build variations of the numba code to do stepped rendering, which looks better for financial data where the price is constant until the next tick
For updating the data layer instead of using the extremely naive np.append()/concatenate() (which is stupidly slow):
-
you need to treat it as a std::vector resize - basically allocate 1.25 the size of the old array and memcopy
-
numpy in this case just stands in for malloc to allocate a block of contiguous memory in python
-
numpy + numba gives you a bit of a systems lang (at least for numeric calcs)
-
one nice thing about numba is you can compile the code on the fly, so if you were setting up a streaming data calc, you could use a bit of python's meta facilites to do some loop unrolling and function call fusion and compile one combined execution kernel (if you built a system that was composable) and then at the start of the system it could compile down all your strategies to fairly optimized machine code
- an example of LMS in
scala
- an example of LMS in
This last point gets into more ideas we've had surrounding making a fast FSP stream processing system as per #106, #102, #107. Ideally we move towards a small DSL for compiling numba chained routines that can be easily declared from UI components.
Thanks to finding pyqtgraph/pyqtgraph#1418 most of the CPU usage and latency issues are actually completely solved now! We'll likely depend on our own fork until that fix lands (which may be a while given the constraints with Qt4).
The latency measures I've been sampling on lines and cursor draw cycles still does scale approximately as
latency = X_bars/1000 * 1ms
So, an attempt at creating lines graphics segments (of say 500 bars per slice) would likely improve latency when the user is viewing smaller sets of data when watching real-time / shorter time frames.
I've already toyed with this idea where we are drawing separate pictures for history vs. the current bar to avoid calling QPicture.drawLines() more then necessary and with as little data as possible per update. Making BarsItems hold an array of QPictures which are draw based on the bars in view will likely give that little boost when viewing near term data that we're after.
Oh, also we've removed the np.append() stuff and now have a new shared mem subsystem coming in #112, so all that discussion can be marked solved.
Good news.
Got a massive massive latency fix with large(r) data sets by using pyqtgraphs functions.arrayToQPath() to generate a QPainterPath with gaps in it to make up bars graphics. Turns out this scales not only better for initial draws but also results in less QGraphicsObject.paint() latency when zoomed in (was ~1m / 1k bars but now is ~3ms / 15k bars).
The only outstanding is getting appends to the historical bars path to be fast; this is still ongoing.
Linking some Qt core issues that have been introduced by pyqtgraph team:
- https://codereview.qt-project.org/c/pyside/pyside-setup/+/415702