holoviews
holoviews copied to clipboard
datashading with datetime axis fails unexpectedly
I am getting the following error:
Error
/usr/local/anaconda3/envs/arara/lib/python3.6/site-packages/holoviews/element/raster.py in __init__(self, data, kdims, vdims, bounds, extents, xdensity, ydensity, rtol, **params)
324 'density.')
325 SheetCoordinateSystem.__init__(self, bounds, xdensity, ydensity)
--> 326 self._validate(data_bounds, supplied_bounds)
327
328
/usr/local/anaconda3/envs/arara/lib/python3.6/site-packages/holoviews/element/raster.py in _validate(self, data_bounds, supplied_bounds)
392 not_close = True
393 if not_close:
--> 394 raise ValueError('Supplied Image bounds do not match the coordinates defined '
395 'in the data. Bounds only have to be declared if no coordinates '
396 'are supplied, otherwise they must match the data. To change '
ValueError: Supplied Image bounds do not match the coordinates defined in the data. Bounds only have to be declared if no coordinates are supplied, otherwise they must match the data. To change the displayed extents set the range on the x- and y-dimensions.
I tried understanding what was happening with pdb
but didn't go that deep into holoviews. What I found is that self.interface
is not an ImageInterface
but a XArrayInterface
and data is a XArray
not a np.ndarray
so the statement below is not executed.
https://github.com/pyviz/holoviews/blob/aa134e1d11b456e5f712c1bcfbc9306f1b69dc1c/holoviews/element/raster.py#L316
Here's a stripped down reproducible example, I tried different DateTime frequencies to understand if it had to do with frequencies or the number of points (this is because originally I was using dask and thought I was plotting too much data). Messing with the initial start date also has an impact.
Reproducible Example
import pandas as pd # 0.24.1
import numpy as np # 1.15.4
import holoviews as hv # 1.11.3
from holoviews.operation.datashader import datashade # datashader 0.6.9
hv.extension('bokeh') # bokeh 1.0.4
def test_plot(size, start_date, freq):
"""size: number of points
freq: frequency on datetime index
"""
df = pd.DataFrame(data={'a': np.random.normal(0, 0.3, size=size).cumsum() + 50},
index=pd.date_range(start_date, periods=size, freq=freq))
print(f'First date: {df.index.min()}\nLast date: {df.index.max()}')
return datashade(hv.Scatter(df))
# base case
test_plot(70119, "1980-01-01", '1H') # this works
test_plot(70120, "1980-01-01", '1H') # this won't
# less points than base case
test_plot(35060, "1980-01-01", '2H') # this works
test_plot(35061, "1980-01-01", '2H') # this won't
# more points than base case
test_plot(4207105, "1980-01-01", '1T') # this works
test_plot(4207106, "1980-01-01", '1T') # this won't
# base case one day ahead
test_plot(70120, "1980-01-02", '1H') # this works
# previous with double points
test_plot(140240, "1980-01-02", '1H') # this won't
# previous 10 years ahead
test_plot(140240, "1990-01-02", '1H') # this works
This has to do with datashading and it doesn't matter if the x-axis is originally an index or just a column although in the example we are using an index.
I faced similar issue with some of the data. In my axis there are lot of observations with small timegaps in between
I can confirm this bug (still) exists in the following configuration: Python 3.7.4 Pandas 0.25.1 Numpy 1.16.4 Holoviews 1.12.3 Datashader 0.7.0 Bokeh 1.3.4
The problem appears not to be in the length of the time series, nor in the sampling. A simple script to show this in a Jupyter Notebook (very similar to @neuronist):
import numpy as np, pandas as pd, holoviews as hv
from holoviews.operation.datashader import datashade
hv.extension('bokeh','matplotlib')
Plotting a 100000 elements works fine (every minute, starting at 1990-01-01):
n = 100000
dates = pd.date_range(start='1990-01-01', freq='1T', periods=n)
curve = hv.Curve((dates,
np.random.normal(size=(n,))))
datashade(curve, cmap=['red']).opts(width=400)
Results in:
However, if I start at 1980-01-01 instead of 1990-01-01:
ValueError: Supplied Image bounds do not match the coordinates defined in the data. Bounds only have to be declared if no coordinates are supplied, otherwise they must match the data. To change the displayed extents set the range on the x- and y-dimensions.
After that I tried to reproduce the series in the original report, with slightly different results...
Periods | Start | Frequency | Result @neuronist | Result (me) |
---|---|---|---|---|
70119 | 1990-01-01 | 1H | N/A | Works |
140240 | 1990-01-02 | 1H | Works | Data disappears directly. |
140240 | 1980-01-02 | 1H | Fails | Fails |
70119 | 1980-01-01 | 1H | Works | Data disappears directly. |
70120 | 1980-01-01 | 1H | Fails | Fails |
70120 | 1980-01-02 | 1H | Works | Data disappears directly. |
35060 | 1980-01-01 | 2H | Works | Data disappears directly. |
35061 | 1980-01-01 | 2H | Fails | Fails |
4207105 | 1980-01-01 | 1T | Works | Data disappears directly. |
4207106 | 1980-01-01 | 1T | Fails | Fails |
Which only shows that anything that failed for @neuronist fails for me as well, while those that worked for him only briefly showed the correct results on my side, while trivial adjustments to the configuration make it work.
Did some more experiments today.
- Disabling Numba (
NUMBA_DISABLE_JIT=1
) makes everything slower (expected) but results in the same error. - Conversion of the time series to pydatetime (
pd.DataFrame.to_pydatetime()
) does not change the outcome.
In the end it boils down to element/raster.py
, where there is a mismatch between the extends of the data and the extends of the (to be generated) raster. Do we disable this warning, everything works (with and without Numba). Furthermore, the direct disappearance of data I previously reported is solved. Maybe this was caused by an immediate update of the raster that did not meet the tolerance?
The validation of numeric images bounds was introduced in a884989384802d2ae9ef81113e0ad9585843783b/#2617, datetime support was added in c40cdd04b42757e013a431fefe42dfb721a5f558/564ac95
(status unknown)/#2794 (only mentioned in a comment by @philippjfr). In my understanding element/raster.py:386-392
compares the calculated image bounds (left, bottom, right, top) to the data bounds. Here the offset comes into play. My left bound (r =
) 1981-01-01T07:00:00.000000, is defined for the image in self.bounds.lbrt()
as (c =
) 1981-01-18T17:32:51.000000000. An offset of more than 17 days! However, the timescale is almost 40 years and this error is relatively small. The right bound (2019-03-14T21:00:00.000000) is off by more than 2½ weeks too, and defined as 2019-02-25T10:27:08.999999000 in self.bounds.lbrt()
. The numerical conversion is just a different expression of this and the comparison np.isclose()
obviously fails. (The role of the supplied_bounds
variable is unclear to me.)
Are those reasonable offsets? Given a plot width of 400 pixels, there are less than 400/38 ≃ 10½ pixels per year ignoring the space occupied by the axis labels. An offset of three weeks of the border pixels is to be expected.
Therefore a temporary workaround for those with courage, is to just disable the warning in element/raster.py:395
.
A more permanent solution is the optimisation of self.rtol
, currently taken as a static value from the configuration in L289.
The default conversion in dt_to_int()
is to micro-seconds, therefore the tolerance should be in the same unit. The density (self.xdensity
) is 3.318354282103916e-13 (µs/pixel), the self.xstep
is calculated from this as 1506771000000 (µs) ≃ 17.44 days, halved for oversampling? Anyhow, with this self.xdensity
each pixel is 1/self.xdensity
µs wide, 34.9 days. The difference should never be more than half a pixel (17.439479 days). Guess how much the right data bound 2019-02-25T10:27:08 is from the image bound 2019-03-14T21:00:00? 17.439479 days...
A quick solution could be:
not_close = False
rtol_dates = ((1. / self.xdensity) / 2. + self.rtol, (1. / self.ydensity) / 2. + self.rtol) * 2
for r, c, rtol_date in zip(bounds, self.bounds.lbrt(), rtol_dates):
rtol = self.rtol
if isinstance(r, util.datetime_types):
r = util.dt_to_int(r)
rtol = rtol_date
if isinstance(c, util.datetime_types):
c = util.dt_to_int(c)
rtol = rtol_date
if util.isfinite(r) and not np.isclose(r, c, rtol=rtol):
not_close = True
if not_close:
raise ValueError('Supplied Image bounds do not match the coordinates defined '
'in the data. Bounds only have to be declared if no coordinates '
'are supplied, otherwise they must match the data. To change '
'the displayed extents set the range on the x- and y-dimensions.')
Thanks for the very detailed analysis!!
Not getting any more errors but I don't think we ever applied the suggested fixes.
A quick check on the following system still yields the same error on my system:
Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] Pandas 0.25.1 Numpy 1.17.2 Holoviews 1.12.5 Datashader 0.7.0 Bokeh 1.3.4
However, the error could have been fixed in master
.
Hi all, I have the same error too.
Base code:
def get_curve(df, label=''):
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'])
return hv.Curve(df, ('TIMESTAMP', 'Time'), ('VALUE_NUM', 'Value'), label=label)
curve = get_curve(df, 'My chart')
Data structure:
curve.options(width=1000)
Note: there is some gap between the data
Using datashader:
ds_curve = datashade(curve, normalization='linear', aggregator=ds.count()).opts(opts.RGB(width=1000, height=400))
ds_curve
--> Raise error as mentions above
I already apply some fixes:
- Increase the
rtol
as @fwrite mention:
hv.extension("bokeh", config=dict(image_rtol=1000))
However, when this number increase, the performance is very poor
- Convert the datetime to int, the chart work well with good performance
def get_curve(df, label=''):
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP']).astype('int64')
return hv.Curve(df, ('TIMESTAMP', 'Time'), ('VALUE_NUM', 'Value'), label=label)
curve = get_curve(df, 'Chart title')
ds_curve = datashade(curve, normalization='linear', aggregator=ds.count()).opts(opts.RGB(width=800, height=300))
ds_curve
So i think increase image_rtol is not a good solution, and not the root cause (because I can plot when converting to int64
I think we should add a option to disable this error when plotting (to use in another case like render in html, rather than global option rtol for Jupiter Notebook)
The problem still appears when using regrid. I noticed that the same amount of data can work fine or break - depends on when it is - see MRE below.
My env: python 3.7.6, bokeh==2.2.3, datashader==0.11.1, holoviews==1.14.0 (problem also appears on the latest versions: bokeh==2.3.1, datashader==0.12.1, holoviews==1.14.3).
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews.operation.datashader import regrid
hv.extension("bokeh")
nt = 10000
nd = 5000
time = pd.to_datetime(np.arange(nt), unit="s").values # year 1970 (broken)
# time = pd.to_datetime(np.arange(nt) + 10**9, unit="s").values # year 2001 (works fine)
distance = np.arange(nd)
data = np.random.rand(nd, nt)
im = hv.Image((time, distance, data))
regrid(im)
Traceback:
WARNING:param.dynamic_operation: Callable raised "ValueError('Supplied Image bounds do not match the coordinates defined in the data. Bounds only have to be declared if no coordinates are supplied, otherwise they must match the data. To change the displayed extents set the range on the x- and y-dimensions.')".
Invoked as dynamic_operation(height=400, scale=1.0, width=400, x_range=None, y_range=None)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj, include, exclude)
968
969 if method is not None:
--> 970 return method(include=include, exclude=exclude)
971 return None
972 else:
/opt/conda/lib/python3.7/site-packages/holoviews/core/dimension.py in _repr_mimebundle_(self, include, exclude)
1314 combined and returned.
1315 """
-> 1316 return Store.render(self)
1317
1318
/opt/conda/lib/python3.7/site-packages/holoviews/core/options.py in render(cls, obj)
1403 data, metadata = {}, {}
1404 for hook in hooks:
-> 1405 ret = hook(obj)
1406 if ret is None:
1407 continue
/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in pprint_display(obj)
280 if not ip.display_formatter.formatters['text/plain'].pprint:
281 return None
--> 282 return display(obj, raw_output=True)
283
284
/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in display(obj, raw_output, **kwargs)
256 elif isinstance(obj, (HoloMap, DynamicMap)):
257 with option_state(obj):
--> 258 output = map_display(obj)
259 elif isinstance(obj, Plot):
260 output = render(obj)
/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in wrapped(element)
144 try:
145 max_frames = OutputSettings.options['max_frames']
--> 146 mimebundle = fn(element, max_frames=max_frames)
147 if mimebundle is None:
148 return {}, {}
/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in map_display(vmap, max_frames)
204 return None
205
--> 206 return render(vmap)
207
208
/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in render(obj, **kwargs)
66 renderer = renderer.instance(fig='png')
67
---> 68 return renderer.components(obj, **kwargs)
69
70
/opt/conda/lib/python3.7/site-packages/holoviews/plotting/renderer.py in components(self, obj, fmt, comm, **kwargs)
408 doc = Document()
409 with config.set(embed=embed):
--> 410 model = plot.layout._render_model(doc, comm)
411 if embed:
412 return render_model(model, comm)
/opt/conda/lib/python3.7/site-packages/panel/viewable.py in _render_model(self, doc, comm)
422 if comm is None:
423 comm = state._comm_manager.get_server_comm()
--> 424 model = self.get_root(doc, comm)
425
426 if config.embed:
/opt/conda/lib/python3.7/site-packages/panel/viewable.py in get_root(self, doc, comm, preprocess)
480 """
481 doc = init_doc(doc)
--> 482 root = self._get_model(doc, comm=comm)
483 if preprocess:
484 self._preprocess(root)
/opt/conda/lib/python3.7/site-packages/panel/layout/base.py in _get_model(self, doc, root, parent, comm)
110 if root is None:
111 root = model
--> 112 objects = self._get_objects(model, [], doc, root, comm)
113 props = dict(self._init_properties(), objects=objects)
114 model.update(**self._process_param_change(props))
/opt/conda/lib/python3.7/site-packages/panel/layout/base.py in _get_objects(self, model, old_objects, doc, root, comm)
100 else:
101 try:
--> 102 child = pane._get_model(doc, root, model, comm)
103 except RerenderError:
104 return self._get_objects(model, current_objects[:i], doc, root, comm)
/opt/conda/lib/python3.7/site-packages/panel/pane/holoviews.py in _get_model(self, doc, root, parent, comm)
239 plot = self.object
240 else:
--> 241 plot = self._render(doc, comm, root)
242
243 plot.pane = self
/opt/conda/lib/python3.7/site-packages/panel/pane/holoviews.py in _render(self, doc, comm, root)
304 kwargs['comm'] = comm
305
--> 306 return renderer.get_plot(self.object, **kwargs)
307
308 def _cleanup(self, root):
/opt/conda/lib/python3.7/site-packages/holoviews/plotting/bokeh/renderer.py in get_plot(self_or_cls, obj, doc, renderer, **kwargs)
71 combining the bokeh model with another plot.
72 """
---> 73 plot = super(BokehRenderer, self_or_cls).get_plot(obj, doc, renderer, **kwargs)
74 if plot.document is None:
75 plot.document = Document() if self_or_cls.notebook_context else curdoc()
/opt/conda/lib/python3.7/site-packages/holoviews/plotting/renderer.py in get_plot(self_or_cls, obj, doc, renderer, comm, **kwargs)
218
219 # Initialize DynamicMaps with first data item
--> 220 initialize_dynamic(obj)
221
222 if not renderer:
/opt/conda/lib/python3.7/site-packages/holoviews/plotting/util.py in initialize_dynamic(obj)
250 continue
251 if not len(dmap):
--> 252 dmap[dmap._initial_key()]
253
254
/opt/conda/lib/python3.7/site-packages/holoviews/core/spaces.py in __getitem__(self, key)
1329 # Not a cross product and nothing cached so compute element.
1330 if cache is not None: return cache
-> 1331 val = self._execute_callback(*tuple_key)
1332 if data_slice:
1333 val = self._dataslice(val, data_slice)
/opt/conda/lib/python3.7/site-packages/holoviews/core/spaces.py in _execute_callback(self, *args)
1098
1099 with dynamicmap_memoization(self.callback, self.streams):
-> 1100 retval = self.callback(*args, **kwargs)
1101 return self._style(retval)
1102
/opt/conda/lib/python3.7/site-packages/holoviews/core/spaces.py in __call__(self, *args, **kwargs)
712
713 try:
--> 714 ret = self.callable(*args, **kwargs)
715 except KeyError:
716 # KeyError is caught separately because it is used to signal
/opt/conda/lib/python3.7/site-packages/holoviews/util/__init__.py in dynamic_operation(*key, **kwargs)
1017 def dynamic_operation(*key, **kwargs):
1018 key, obj = resolve(key, kwargs)
-> 1019 return apply(obj, *key, **kwargs)
1020
1021 operation = self.p.operation
/opt/conda/lib/python3.7/site-packages/holoviews/util/__init__.py in apply(element, *key, **kwargs)
1009 def apply(element, *key, **kwargs):
1010 kwargs = dict(util.resolve_dependent_kwargs(self.p.kwargs), **kwargs)
-> 1011 processed = self._process(element, key, kwargs)
1012 if (self.p.link_dataset and isinstance(element, Dataset) and
1013 isinstance(processed, Dataset) and processed._dataset is None):
/opt/conda/lib/python3.7/site-packages/holoviews/util/__init__.py in _process(self, element, key, kwargs)
991 elif isinstance(self.p.operation, Operation):
992 kwargs = {k: v for k, v in kwargs.items() if k in self.p.operation.param}
--> 993 return self.p.operation.process_element(element, key, **kwargs)
994 else:
995 return self.p.operation(element, **kwargs)
/opt/conda/lib/python3.7/site-packages/holoviews/core/operation.py in process_element(self, element, key, **params)
192 self.p = param.ParamOverrides(self, params,
193 allow_extra_keywords=self._allow_extra_keywords)
--> 194 return self._apply(element, key)
195
196
/opt/conda/lib/python3.7/site-packages/holoviews/core/operation.py in _apply(self, element, key)
139 if not in_method:
140 element._in_method = True
--> 141 ret = self._process(element, key)
142 if hasattr(element, '_in_method') and not in_method:
143 element._in_method = in_method
/opt/conda/lib/python3.7/site-packages/holoviews/operation/datashader.py in _process(self, element, key)
948 regridded = xr.Dataset(regridded)
949
--> 950 return element.clone(regridded, datatype=['xarray']+element.datatype, **params)
951
952
/opt/conda/lib/python3.7/site-packages/holoviews/element/raster.py in clone(self, data, shared_data, new_type, link, *args, **overrides)
428 overrides = dict(sheet_params, **overrides)
429 return super(Image, self).clone(data, shared_data, new_type, link,
--> 430 *args, **overrides)
431
432
/opt/conda/lib/python3.7/site-packages/holoviews/core/data/__init__.py in clone(self, data, shared_data, new_type, link, *args, **overrides)
1204
1205 return super(Dataset, self).clone(
-> 1206 data, shared_data, new_type, *args, **overrides
1207 )
1208
/opt/conda/lib/python3.7/site-packages/holoviews/core/dimension.py in clone(self, data, shared_data, new_type, link, *args, **overrides)
573 # Apply name mangling for __ attribute
574 pos_args = getattr(self, '_' + type(self).__name__ + '__pos_params', [])
--> 575 return clone_type(data, *args, **{k:v for k,v in settings.items()
576 if k not in pos_args})
577
/opt/conda/lib/python3.7/site-packages/holoviews/element/raster.py in __init__(self, data, kdims, vdims, bounds, extents, xdensity, ydensity, rtol, **params)
327 if non_finite:
328 self.bounds = BoundingBox(points=((np.nan, np.nan), (np.nan, np.nan)))
--> 329 self._validate(data_bounds, supplied_bounds)
330
331 def _validate(self, data_bounds, supplied_bounds):
/opt/conda/lib/python3.7/site-packages/holoviews/element/raster.py in _validate(self, data_bounds, supplied_bounds)
394 not_close = True
395 if not_close:
--> 396 raise ValueError('Supplied Image bounds do not match the coordinates defined '
397 'in the data. Bounds only have to be declared if no coordinates '
398 'are supplied, otherwise they must match the data. To change '
ValueError: Supplied Image bounds do not match the coordinates defined in the data. Bounds only have to be declared if no coordinates are supplied, otherwise they must match the data. To change the displayed extents set the range on the x- and y-dimensions.
Does anyone have any updates on workarounds or progress toward fixes for this issue? Converting datetimes to ints didn't solve the problem for me (I still run into Supplied Image bounds do not match the coordinates defined in the data
) when attempting to rasterize a curve.
edit: increasing rtol
seems to work for my particular use-case 🤷♂️: hv.extension("bokeh", config=dict(image_rtol=1000))
Any progress on this. Large time series can really benefit from this visualization.
Happy 5th birthday to this issue 🎂
Still suffering from it in 2024