datashader
datashader copied to clipboard
Deprecate Pipeline
It would be nice to have an option to boost the data point size when the overall view occupancy is low. A quick look at an example like this makes me feel like there could be some fixed ratios that could be used to 2x, 3x, 4x the marker area as overall pixel occupancy drops. This may not be feasible when there are overlapping data categories with different markers or colors, if I'm understanding Datashader's pipeline which I believe leads to a rasterized (rather than vector) image, but if it were optional then the choice to use this could be left to the developer.

Right, that's how dynspread
works. In recent HoloViews versions, just import dynspread and datashade from holoviews.operation.datashader
, then do dynspread(datashade(...))
. Outside of holoviews, you can apply datashader.transfer_functions.dynspread
to the resulting RGB array. It takes some arguments that can make it work better in certain cases, but it roughly does what you described (measures the fraction of pixels that touch other active pixels, and increase size until that fraction reaches some threshold or a maximum size is reached).
An alternative that's faster and a bit easier to reason about is to provide a minimum x_sampling
and y_sampling
width to datashade()
in data coordinates. E.g. for geographic data, you can say that sampling should never be finer than 1 meter or 10 meters, and so when you zoom in enough the pixel size will eventually be stuck at that size, making it be visible. But this approach won't ever reveal any data points spaced more closely than that, the appropriate value depends strongly on the dataset properties, and it will always give you square pixels. So use whichever approach you prefer!
The screenshot above comes from the datashader
example census.ipynb
:
https://github.com/bokeh/datashader/blob/master/examples/census.ipynb
I've read the docs which seem to suggest that dynspread
is the default transfer function, and I've tried to add some settings based on the examples on this page but I'm not seeing any impact. Unfortunately the "compare these two outputs" examples aren't looking any different for me as I zoom in:

Any advice on how to properly harness dynspread
with the census data would be appreciated. If you want to see the exact area I'm looking at you can go to Syracuse New York and zoom in (census.ipynb
).
That's odd; what docs suggest a default transfer function? There is no default transformation like that, apart from shade() itself turning data into images. I'll need to edit whatever that is, to clarify it.
The "Large_Data" example you show there will work fine on a live server, but there's no Python server behind that web page, and so dynspread isn't actually being used unless you run that page yourself on your local machine. I've added that note to the page for clarity in my own clone of the repo, but we already have a bigger issue opened saying we need lots of those warnings, and unfortunately our website building is currently broken so we can't update things easily. :-(
Not sure what you're seeing for your live coding, but if you post that it should be easy to debug, because dynspread is very obvious usually.
The documentation here on datashader.Pipeline
with the parameter spread_fn
says that dynspread
is the default function:
http://datashader.readthedocs.io/en/latest/api.html#datashader.Pipeline
For the issue with the docs, I've reported that separately in #484.
If you don't mind I'm going to leave this open until I find the right way to apply dynspread
to the census.ipynb
example so I can see an effect. But if you want to close this (or if it looks stale), I won't mind.
Ah! That's the default for Pipeline, but you aren't using Pipeline, and in general I think we should deprecate Pipeline now that HoloViews is a much more useful interface. Thanks for the reminder! I'll rename this issue so I don't lose track of it.