holoviews icon indicating copy to clipboard operation
holoviews copied to clipboard

datashading with datetime axis fails unexpectedly

Open nr7s opened this issue 5 years ago • 11 comments

I am getting the following error:

Error
/usr/local/anaconda3/envs/arara/lib/python3.6/site-packages/holoviews/element/raster.py in __init__(self, data, kdims, vdims, bounds, extents, xdensity, ydensity, rtol, **params)
    324                              'density.')
    325         SheetCoordinateSystem.__init__(self, bounds, xdensity, ydensity)
--> 326         self._validate(data_bounds, supplied_bounds)
    327 
    328 

/usr/local/anaconda3/envs/arara/lib/python3.6/site-packages/holoviews/element/raster.py in _validate(self, data_bounds, supplied_bounds)
    392                 not_close = True
    393         if not_close:
--> 394             raise ValueError('Supplied Image bounds do not match the coordinates defined '
    395                              'in the data. Bounds only have to be declared if no coordinates '
    396                              'are supplied, otherwise they must match the data. To change '

ValueError: Supplied Image bounds do not match the coordinates defined in the data. Bounds only have to be declared if no coordinates are supplied, otherwise they must match the data. To change the displayed extents set the range on the x- and y-dimensions.

I tried understanding what was happening with pdb but didn't go that deep into holoviews. What I found is that self.interface is not an ImageInterface but a XArrayInterface and data is a XArray not a np.ndarray so the statement below is not executed. https://github.com/pyviz/holoviews/blob/aa134e1d11b456e5f712c1bcfbc9306f1b69dc1c/holoviews/element/raster.py#L316

Here's a stripped down reproducible example, I tried different DateTime frequencies to understand if it had to do with frequencies or the number of points (this is because originally I was using dask and thought I was plotting too much data). Messing with the initial start date also has an impact.

Reproducible Example
import pandas as pd  # 0.24.1
import numpy as np  # 1.15.4
import holoviews as hv  # 1.11.3
from holoviews.operation.datashader import datashade  # datashader 0.6.9
hv.extension('bokeh')  # bokeh 1.0.4
def test_plot(size, start_date, freq):
    """size: number of points
       freq: frequency on datetime index
    """
    df = pd.DataFrame(data={'a': np.random.normal(0, 0.3, size=size).cumsum() + 50},
                      index=pd.date_range(start_date, periods=size, freq=freq))
    print(f'First date: {df.index.min()}\nLast date: {df.index.max()}')
    return datashade(hv.Scatter(df))
# base case
test_plot(70119, "1980-01-01", '1H') # this works
test_plot(70120, "1980-01-01",  '1H') # this won't
# less points than base case
test_plot(35060, "1980-01-01", '2H') # this works
test_plot(35061, "1980-01-01", '2H') # this won't
# more points than base case
test_plot(4207105, "1980-01-01", '1T') # this works
test_plot(4207106, "1980-01-01", '1T') # this won't
# base case one day ahead
test_plot(70120, "1980-01-02", '1H') # this works
# previous with double points
test_plot(140240, "1980-01-02", '1H') # this won't
# previous 10 years ahead
test_plot(140240, "1990-01-02", '1H') # this works

This has to do with datashading and it doesn't matter if the x-axis is originally an index or just a column although in the example we are using an index.

nr7s avatar Mar 05 '19 11:03 nr7s

I faced similar issue with some of the data. In my axis there are lot of observations with small timegaps in between

rtmlp avatar Apr 05 '19 18:04 rtmlp

I can confirm this bug (still) exists in the following configuration: Python 3.7.4 Pandas 0.25.1 Numpy 1.16.4 Holoviews 1.12.3 Datashader 0.7.0 Bokeh 1.3.4

The problem appears not to be in the length of the time series, nor in the sampling. A simple script to show this in a Jupyter Notebook (very similar to @neuronist):

import numpy as np, pandas as pd, holoviews as hv
from holoviews.operation.datashader import datashade
hv.extension('bokeh','matplotlib')

Plotting a 100000 elements works fine (every minute, starting at 1990-01-01):

n = 100000
dates = pd.date_range(start='1990-01-01', freq='1T', periods=n)
curve = hv.Curve((dates,
                  np.random.normal(size=(n,))))
datashade(curve, cmap=['red']).opts(width=400)

Results in: image

However, if I start at 1980-01-01 instead of 1990-01-01:

ValueError: Supplied Image bounds do not match the coordinates defined in the data. Bounds only have to be declared if no coordinates are supplied, otherwise they must match the data. To change the displayed extents set the range on the x- and y-dimensions.

After that I tried to reproduce the series in the original report, with slightly different results...

Periods Start Frequency Result @neuronist Result (me)
70119 1990-01-01 1H N/A Works
140240 1990-01-02 1H Works Data disappears directly.
140240 1980-01-02 1H Fails Fails
70119 1980-01-01 1H Works Data disappears directly.
70120 1980-01-01 1H Fails Fails
70120 1980-01-02 1H Works Data disappears directly.
35060 1980-01-01 2H Works Data disappears directly.
35061 1980-01-01 2H Fails Fails
4207105 1980-01-01 1T Works Data disappears directly.
4207106 1980-01-01 1T Fails Fails

Which only shows that anything that failed for @neuronist fails for me as well, while those that worked for him only briefly showed the correct results on my side, while trivial adjustments to the configuration make it work.

fwrite avatar Sep 11 '19 15:09 fwrite

Did some more experiments today.

  • Disabling Numba (NUMBA_DISABLE_JIT=1) makes everything slower (expected) but results in the same error.
  • Conversion of the time series to pydatetime (pd.DataFrame.to_pydatetime()) does not change the outcome.

In the end it boils down to element/raster.py, where there is a mismatch between the extends of the data and the extends of the (to be generated) raster. Do we disable this warning, everything works (with and without Numba). Furthermore, the direct disappearance of data I previously reported is solved. Maybe this was caused by an immediate update of the raster that did not meet the tolerance?

The validation of numeric images bounds was introduced in a884989384802d2ae9ef81113e0ad9585843783b/#2617, datetime support was added in c40cdd04b42757e013a431fefe42dfb721a5f558/564ac95 (status unknown)/#2794 (only mentioned in a comment by @philippjfr). In my understanding element/raster.py:386-392 compares the calculated image bounds (left, bottom, right, top) to the data bounds. Here the offset comes into play. My left bound (r =) 1981-01-01T07:00:00.000000, is defined for the image in self.bounds.lbrt() as (c =) 1981-01-18T17:32:51.000000000. An offset of more than 17 days! However, the timescale is almost 40 years and this error is relatively small. The right bound (2019-03-14T21:00:00.000000) is off by more than 2½ weeks too, and defined as 2019-02-25T10:27:08.999999000 in self.bounds.lbrt(). The numerical conversion is just a different expression of this and the comparison np.isclose() obviously fails. (The role of the supplied_bounds variable is unclear to me.)

Are those reasonable offsets? Given a plot width of 400 pixels, there are less than 400/38 ≃ 10½ pixels per year ignoring the space occupied by the axis labels. An offset of three weeks of the border pixels is to be expected.

Therefore a temporary workaround for those with courage, is to just disable the warning in element/raster.py:395.

A more permanent solution is the optimisation of self.rtol, currently taken as a static value from the configuration in L289.

The default conversion in dt_to_int() is to micro-seconds, therefore the tolerance should be in the same unit. The density (self.xdensity) is 3.318354282103916e-13 (µs/pixel), the self.xstep is calculated from this as 1506771000000 (µs) ≃ 17.44 days, halved for oversampling? Anyhow, with this self.xdensity each pixel is 1/self.xdensity µs wide, 34.9 days. The difference should never be more than half a pixel (17.439479 days). Guess how much the right data bound 2019-02-25T10:27:08 is from the image bound 2019-03-14T21:00:00? 17.439479 days...

A quick solution could be:

not_close = False
rtol_dates = ((1. / self.xdensity) / 2. + self.rtol, (1. / self.ydensity) / 2. + self.rtol) * 2
for r, c, rtol_date in zip(bounds, self.bounds.lbrt(), rtol_dates):
    rtol = self.rtol
    if isinstance(r, util.datetime_types):
        r = util.dt_to_int(r)
        rtol = rtol_date
    if isinstance(c, util.datetime_types):
        c = util.dt_to_int(c)
        rtol = rtol_date
    if util.isfinite(r) and not np.isclose(r, c, rtol=rtol):
        not_close = True
if not_close:
    raise ValueError('Supplied Image bounds do not match the coordinates defined '
                     'in the data. Bounds only have to be declared if no coordinates '
                     'are supplied, otherwise they must match the data. To change '
                     'the displayed extents set the range on the x- and y-dimensions.')

fwrite avatar Sep 12 '19 12:09 fwrite

Thanks for the very detailed analysis!!

jbednar avatar Sep 12 '19 21:09 jbednar

Not getting any more errors but I don't think we ever applied the suggested fixes.

philippjfr avatar Oct 01 '19 22:10 philippjfr

A quick check on the following system still yields the same error on my system:

Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] Pandas 0.25.1 Numpy 1.17.2 Holoviews 1.12.5 Datashader 0.7.0 Bokeh 1.3.4

However, the error could have been fixed in master.

fwrite avatar Oct 02 '19 09:10 fwrite

Hi all, I have the same error too.

Base code:

def get_curve(df, label=''):
    df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'])
    return hv.Curve(df, ('TIMESTAMP', 'Time'), ('VALUE_NUM', 'Value'), label=label)

curve = get_curve(df, 'My chart')

Data structure:

curve.options(width=1000)

image Note: there is some gap between the data

Using datashader:

ds_curve = datashade(curve, normalization='linear', aggregator=ds.count()).opts(opts.RGB(width=1000, height=400))
ds_curve

--> Raise error as mentions above

I already apply some fixes:

  1. Increase the rtol as @fwrite mention:
hv.extension("bokeh", config=dict(image_rtol=1000))

However, when this number increase, the performance is very poor

  1. Convert the datetime to int, the chart work well with good performance
def get_curve(df, label=''):
    df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP']).astype('int64')
    return hv.Curve(df, ('TIMESTAMP', 'Time'), ('VALUE_NUM', 'Value'), label=label)

curve = get_curve(df, 'Chart title')
ds_curve = datashade(curve, normalization='linear', aggregator=ds.count()).opts(opts.RGB(width=800, height=300))
ds_curve

image

So i think increase image_rtol is not a good solution, and not the root cause (because I can plot when converting to int64

I think we should add a option to disable this error when plotting (to use in another case like render in html, rather than global option rtol for Jupiter Notebook)

tienlx93 avatar Oct 20 '19 07:10 tienlx93

The problem still appears when using regrid. I noticed that the same amount of data can work fine or break - depends on when it is - see MRE below.

My env: python 3.7.6, bokeh==2.2.3, datashader==0.11.1, holoviews==1.14.0 (problem also appears on the latest versions: bokeh==2.3.1, datashader==0.12.1, holoviews==1.14.3).

import pandas as pd
import numpy as np
import holoviews as hv
from holoviews.operation.datashader import regrid
hv.extension("bokeh")

nt = 10000
nd = 5000
time = pd.to_datetime(np.arange(nt), unit="s").values  # year 1970 (broken)
# time = pd.to_datetime(np.arange(nt) + 10**9, unit="s").values  # year 2001 (works fine)
distance = np.arange(nd)
data = np.random.rand(nd, nt)

im = hv.Image((time, distance, data))
regrid(im)

Traceback:

WARNING:param.dynamic_operation: Callable raised "ValueError('Supplied Image bounds do not match the coordinates defined in the data. Bounds only have to be declared if no coordinates are supplied, otherwise they must match the data. To change the displayed extents set the range on the x- and y-dimensions.')".
Invoked as dynamic_operation(height=400, scale=1.0, width=400, x_range=None, y_range=None)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj, include, exclude)
    968 
    969             if method is not None:
--> 970                 return method(include=include, exclude=exclude)
    971             return None
    972         else:

/opt/conda/lib/python3.7/site-packages/holoviews/core/dimension.py in _repr_mimebundle_(self, include, exclude)
   1314         combined and returned.
   1315         """
-> 1316         return Store.render(self)
   1317 
   1318 

/opt/conda/lib/python3.7/site-packages/holoviews/core/options.py in render(cls, obj)
   1403         data, metadata = {}, {}
   1404         for hook in hooks:
-> 1405             ret = hook(obj)
   1406             if ret is None:
   1407                 continue

/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in pprint_display(obj)
    280     if not ip.display_formatter.formatters['text/plain'].pprint:
    281         return None
--> 282     return display(obj, raw_output=True)
    283 
    284 

/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in display(obj, raw_output, **kwargs)
    256     elif isinstance(obj, (HoloMap, DynamicMap)):
    257         with option_state(obj):
--> 258             output = map_display(obj)
    259     elif isinstance(obj, Plot):
    260         output = render(obj)

/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in wrapped(element)
    144         try:
    145             max_frames = OutputSettings.options['max_frames']
--> 146             mimebundle = fn(element, max_frames=max_frames)
    147             if mimebundle is None:
    148                 return {}, {}

/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in map_display(vmap, max_frames)
    204         return None
    205 
--> 206     return render(vmap)
    207 
    208 

/opt/conda/lib/python3.7/site-packages/holoviews/ipython/display_hooks.py in render(obj, **kwargs)
     66         renderer = renderer.instance(fig='png')
     67 
---> 68     return renderer.components(obj, **kwargs)
     69 
     70 

/opt/conda/lib/python3.7/site-packages/holoviews/plotting/renderer.py in components(self, obj, fmt, comm, **kwargs)
    408                 doc = Document()
    409                 with config.set(embed=embed):
--> 410                     model = plot.layout._render_model(doc, comm)
    411                 if embed:
    412                     return render_model(model, comm)

/opt/conda/lib/python3.7/site-packages/panel/viewable.py in _render_model(self, doc, comm)
    422         if comm is None:
    423             comm = state._comm_manager.get_server_comm()
--> 424         model = self.get_root(doc, comm)
    425 
    426         if config.embed:

/opt/conda/lib/python3.7/site-packages/panel/viewable.py in get_root(self, doc, comm, preprocess)
    480         """
    481         doc = init_doc(doc)
--> 482         root = self._get_model(doc, comm=comm)
    483         if preprocess:
    484             self._preprocess(root)

/opt/conda/lib/python3.7/site-packages/panel/layout/base.py in _get_model(self, doc, root, parent, comm)
    110         if root is None:
    111             root = model
--> 112         objects = self._get_objects(model, [], doc, root, comm)
    113         props = dict(self._init_properties(), objects=objects)
    114         model.update(**self._process_param_change(props))

/opt/conda/lib/python3.7/site-packages/panel/layout/base.py in _get_objects(self, model, old_objects, doc, root, comm)
    100             else:
    101                 try:
--> 102                     child = pane._get_model(doc, root, model, comm)
    103                 except RerenderError:
    104                     return self._get_objects(model, current_objects[:i], doc, root, comm)

/opt/conda/lib/python3.7/site-packages/panel/pane/holoviews.py in _get_model(self, doc, root, parent, comm)
    239             plot = self.object
    240         else:
--> 241             plot = self._render(doc, comm, root)
    242 
    243         plot.pane = self

/opt/conda/lib/python3.7/site-packages/panel/pane/holoviews.py in _render(self, doc, comm, root)
    304                 kwargs['comm'] = comm
    305 
--> 306         return renderer.get_plot(self.object, **kwargs)
    307 
    308     def _cleanup(self, root):

/opt/conda/lib/python3.7/site-packages/holoviews/plotting/bokeh/renderer.py in get_plot(self_or_cls, obj, doc, renderer, **kwargs)
     71         combining the bokeh model with another plot.
     72         """
---> 73         plot = super(BokehRenderer, self_or_cls).get_plot(obj, doc, renderer, **kwargs)
     74         if plot.document is None:
     75             plot.document = Document() if self_or_cls.notebook_context else curdoc()

/opt/conda/lib/python3.7/site-packages/holoviews/plotting/renderer.py in get_plot(self_or_cls, obj, doc, renderer, comm, **kwargs)
    218 
    219         # Initialize DynamicMaps with first data item
--> 220         initialize_dynamic(obj)
    221 
    222         if not renderer:

/opt/conda/lib/python3.7/site-packages/holoviews/plotting/util.py in initialize_dynamic(obj)
    250             continue
    251         if not len(dmap):
--> 252             dmap[dmap._initial_key()]
    253 
    254 

/opt/conda/lib/python3.7/site-packages/holoviews/core/spaces.py in __getitem__(self, key)
   1329         # Not a cross product and nothing cached so compute element.
   1330         if cache is not None: return cache
-> 1331         val = self._execute_callback(*tuple_key)
   1332         if data_slice:
   1333             val = self._dataslice(val, data_slice)

/opt/conda/lib/python3.7/site-packages/holoviews/core/spaces.py in _execute_callback(self, *args)
   1098 
   1099         with dynamicmap_memoization(self.callback, self.streams):
-> 1100             retval = self.callback(*args, **kwargs)
   1101         return self._style(retval)
   1102 

/opt/conda/lib/python3.7/site-packages/holoviews/core/spaces.py in __call__(self, *args, **kwargs)
    712 
    713         try:
--> 714             ret = self.callable(*args, **kwargs)
    715         except KeyError:
    716             # KeyError is caught separately because it is used to signal

/opt/conda/lib/python3.7/site-packages/holoviews/util/__init__.py in dynamic_operation(*key, **kwargs)
   1017         def dynamic_operation(*key, **kwargs):
   1018             key, obj = resolve(key, kwargs)
-> 1019             return apply(obj, *key, **kwargs)
   1020 
   1021         operation = self.p.operation

/opt/conda/lib/python3.7/site-packages/holoviews/util/__init__.py in apply(element, *key, **kwargs)
   1009         def apply(element, *key, **kwargs):
   1010             kwargs = dict(util.resolve_dependent_kwargs(self.p.kwargs), **kwargs)
-> 1011             processed = self._process(element, key, kwargs)
   1012             if (self.p.link_dataset and isinstance(element, Dataset) and
   1013                 isinstance(processed, Dataset) and processed._dataset is None):

/opt/conda/lib/python3.7/site-packages/holoviews/util/__init__.py in _process(self, element, key, kwargs)
    991         elif isinstance(self.p.operation, Operation):
    992             kwargs = {k: v for k, v in kwargs.items() if k in self.p.operation.param}
--> 993             return self.p.operation.process_element(element, key, **kwargs)
    994         else:
    995             return self.p.operation(element, **kwargs)

/opt/conda/lib/python3.7/site-packages/holoviews/core/operation.py in process_element(self, element, key, **params)
    192             self.p = param.ParamOverrides(self, params,
    193                                           allow_extra_keywords=self._allow_extra_keywords)
--> 194         return self._apply(element, key)
    195 
    196 

/opt/conda/lib/python3.7/site-packages/holoviews/core/operation.py in _apply(self, element, key)
    139             if not in_method:
    140                 element._in_method = True
--> 141         ret = self._process(element, key)
    142         if hasattr(element, '_in_method') and not in_method:
    143             element._in_method = in_method

/opt/conda/lib/python3.7/site-packages/holoviews/operation/datashader.py in _process(self, element, key)
    948         regridded = xr.Dataset(regridded)
    949 
--> 950         return element.clone(regridded, datatype=['xarray']+element.datatype, **params)
    951 
    952 

/opt/conda/lib/python3.7/site-packages/holoviews/element/raster.py in clone(self, data, shared_data, new_type, link, *args, **overrides)
    428             overrides = dict(sheet_params, **overrides)
    429         return super(Image, self).clone(data, shared_data, new_type, link,
--> 430                                         *args, **overrides)
    431 
    432 

/opt/conda/lib/python3.7/site-packages/holoviews/core/data/__init__.py in clone(self, data, shared_data, new_type, link, *args, **overrides)
   1204 
   1205         return super(Dataset, self).clone(
-> 1206             data, shared_data, new_type, *args, **overrides
   1207         )
   1208 

/opt/conda/lib/python3.7/site-packages/holoviews/core/dimension.py in clone(self, data, shared_data, new_type, link, *args, **overrides)
    573         # Apply name mangling for __ attribute
    574         pos_args = getattr(self, '_' + type(self).__name__ + '__pos_params', [])
--> 575         return clone_type(data, *args, **{k:v for k,v in settings.items()
    576                                           if k not in pos_args})
    577 

/opt/conda/lib/python3.7/site-packages/holoviews/element/raster.py in __init__(self, data, kdims, vdims, bounds, extents, xdensity, ydensity, rtol, **params)
    327         if non_finite:
    328            self.bounds = BoundingBox(points=((np.nan, np.nan), (np.nan, np.nan)))
--> 329         self._validate(data_bounds, supplied_bounds)
    330 
    331     def _validate(self, data_bounds, supplied_bounds):

/opt/conda/lib/python3.7/site-packages/holoviews/element/raster.py in _validate(self, data_bounds, supplied_bounds)
    394                 not_close = True
    395         if not_close:
--> 396             raise ValueError('Supplied Image bounds do not match the coordinates defined '
    397                              'in the data. Bounds only have to be declared if no coordinates '
    398                              'are supplied, otherwise they must match the data. To change '

ValueError: Supplied Image bounds do not match the coordinates defined in the data. Bounds only have to be declared if no coordinates are supplied, otherwise they must match the data. To change the displayed extents set the range on the x- and y-dimensions.

rafgonsi avatar May 07 '21 11:05 rafgonsi

Does anyone have any updates on workarounds or progress toward fixes for this issue? Converting datetimes to ints didn't solve the problem for me (I still run into Supplied Image bounds do not match the coordinates defined in the data) when attempting to rasterize a curve.

edit: increasing rtol seems to work for my particular use-case 🤷‍♂️: hv.extension("bokeh", config=dict(image_rtol=1000))

sjdemartini avatar Oct 08 '21 17:10 sjdemartini

Any progress on this. Large time series can really benefit from this visualization.

dwr-psandhu avatar Jan 04 '24 18:01 dwr-psandhu

Happy 5th birthday to this issue 🎂

Still suffering from it in 2024

openSourcerer9000 avatar May 02 '24 16:05 openSourcerer9000