datashader icon indicating copy to clipboard operation
datashader copied to clipboard

Deprecate Pipeline

Open ijstokes opened this issue 7 years ago • 6 comments

It would be nice to have an option to boost the data point size when the overall view occupancy is low. A quick look at an example like this makes me feel like there could be some fixed ratios that could be used to 2x, 3x, 4x the marker area as overall pixel occupancy drops. This may not be feasible when there are overlapping data categories with different markers or colors, if I'm understanding Datashader's pipeline which I believe leads to a rasterized (rather than vector) image, but if it were optional then the choice to use this could be left to the developer.

screenshot 2017-09-28 22 11 56

ijstokes avatar Sep 29 '17 02:09 ijstokes

Right, that's how dynspread works. In recent HoloViews versions, just import dynspread and datashade from holoviews.operation.datashader, then do dynspread(datashade(...)). Outside of holoviews, you can apply datashader.transfer_functions.dynspread to the resulting RGB array. It takes some arguments that can make it work better in certain cases, but it roughly does what you described (measures the fraction of pixels that touch other active pixels, and increase size until that fraction reaches some threshold or a maximum size is reached).

jbednar avatar Sep 29 '17 02:09 jbednar

An alternative that's faster and a bit easier to reason about is to provide a minimum x_sampling and y_sampling width to datashade() in data coordinates. E.g. for geographic data, you can say that sampling should never be finer than 1 meter or 10 meters, and so when you zoom in enough the pixel size will eventually be stuck at that size, making it be visible. But this approach won't ever reveal any data points spaced more closely than that, the appropriate value depends strongly on the dataset properties, and it will always give you square pixels. So use whichever approach you prefer!

jbednar avatar Sep 29 '17 02:09 jbednar

The screenshot above comes from the datashader example census.ipynb:

https://github.com/bokeh/datashader/blob/master/examples/census.ipynb

I've read the docs which seem to suggest that dynspread is the default transfer function, and I've tried to add some settings based on the examples on this page but I'm not seeing any impact. Unfortunately the "compare these two outputs" examples aren't looking any different for me as I zoom in:

screenshot 2017-09-28 22 51 18

Any advice on how to properly harness dynspread with the census data would be appreciated. If you want to see the exact area I'm looking at you can go to Syracuse New York and zoom in (census.ipynb).

ijstokes avatar Sep 29 '17 02:09 ijstokes

That's odd; what docs suggest a default transfer function? There is no default transformation like that, apart from shade() itself turning data into images. I'll need to edit whatever that is, to clarify it.

The "Large_Data" example you show there will work fine on a live server, but there's no Python server behind that web page, and so dynspread isn't actually being used unless you run that page yourself on your local machine. I've added that note to the page for clarity in my own clone of the repo, but we already have a bigger issue opened saying we need lots of those warnings, and unfortunately our website building is currently broken so we can't update things easily. :-(

Not sure what you're seeing for your live coding, but if you post that it should be easy to debug, because dynspread is very obvious usually.

jbednar avatar Sep 29 '17 03:09 jbednar

The documentation here on datashader.Pipeline with the parameter spread_fn says that dynspread is the default function:

http://datashader.readthedocs.io/en/latest/api.html#datashader.Pipeline

For the issue with the docs, I've reported that separately in #484.

If you don't mind I'm going to leave this open until I find the right way to apply dynspread to the census.ipynb example so I can see an effect. But if you want to close this (or if it looks stale), I won't mind.

ijstokes avatar Sep 30 '17 01:09 ijstokes

Ah! That's the default for Pipeline, but you aren't using Pipeline, and in general I think we should deprecate Pipeline now that HoloViews is a much more useful interface. Thanks for the reminder! I'll rename this issue so I don't lose track of it.

jbednar avatar Sep 30 '17 12:09 jbednar