plot icon indicating copy to clipboard operation
plot copied to clipboard

data decimation transform

Open Fil opened this issue 2 years ago • 7 comments
trafficstars

A transform to decimate (sample) data, by filtering the index.

Possible strategies:

  • % sample, not smart but easy to understand
  • lttb (ref. https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf); adaptive binning.
  • M4 (as seen in mosaic), binning.

In practice we probably don't need all the methods; having one by default would be enough. M4 is easy to implement.

Fil avatar Jun 21 '23 20:06 Fil

A good place to apply decimation (almost transparently) is just before rendering. The index is filtered, and the X and Y values are scaled. We can filter the index again, so that the rendering is (almost) the same, but with a lighter path/footprint.

This notebook implements the M4 strategy on the line mark: https://observablehq.com/@fil/fast-brush-with-line-simplification

A line chart based on 10 million points, which was impossible to render, becomes possible. A brush (#5) can even be added and enjoy interactive speed.

Fil avatar Nov 27 '23 18:11 Fil

I would love for us to do decimation automatically (and transparently) when rendering areas and lines. (Even if only for the linear curve… and maybe we can make it work for the step curve too.)

mbostock avatar Dec 27 '23 19:12 mbostock

I think I've solved the issue with curves (of all known types) in the prototype notebook. This is now using an extension of M4, where I add the first and last points, and for some curves the second and next to last points too. I'll work on a PR.

Fil avatar Jan 02 '24 11:01 Fil

I have two requests:

  1. I need to smoothen a time-series so it doesn't look so erratic when plotted, but it is important that I keep the peaks which get smoothened out by the windowY transform. Would it be possible to make this new down-sampling method one of the reduce options in the windowY transform, so I could both smoothen the data and keep the peaks?

  2. In another problem, I have to down-sample an array in Java-script. So it would be very useful to me, if you can provide direct access to the Java-script function for this down-sampling algorithm, similar to how I can compute e.g. histograms using a D3 function without actually plotting it.

Thanks!

Hvass-Labs avatar Jan 15 '24 15:01 Hvass-Labs

For 1, let me refer to this notebook: https://observablehq.com/@fil/time-series-topological-subsampling. This framework offers a good way to think about the problem (like formally defining what a "peak" is), and the algorithm is pretty fast. There is a link to a second notebook that uses it with Plot.

For 2, if you still want to use the M4 strategy you could adapt the decimateIndex function I'm suggesting in the PR. It's using normalized values (in pixels), with a scaling factor pixelSize that you can tweak to decide which values of the horizontal component fall into the same "bucket". For example if X contains dates and the unit bucket that you're considering is an hour, you would use pixelSize: 3_600_000 (60x60x1000 milliseconds).

Fil avatar Jan 15 '24 18:01 Fil

Thanks for the suggestions!

I have taken a look at your Notebook and it looks great, but it is also considerably beyond my skill-level in this field :-) So hopefully you will one day make this an easy-to-use transform like windowY that everyone can use.

But let me elaborate a bit why I need this kind of smoothing. The time-series contains e.g. 16,000 daily data-points, which looks a bit erratic when plotted without smoothing.

I am also using your brushing / selection feature so the user can select a range of the plot and copy the data. But the extremes are fairly important in this application, so the user may be surprised if the copied data has more extreme values than what is shown in the plot.

That's why I think it may be good to smoothen the data while keeping the extremes.

smoothing using windowY (1)

smoothing using windowY (2)

Hvass-Labs avatar Jan 17 '24 11:01 Hvass-Labs

Also AM4 described here in the DashQL paper: https://arxiv.org/pdf/2306.03714.pdf

Screenshot 2024-04-05 at 2 40 07 PM

mbostock avatar Apr 05 '24 21:04 mbostock