bokeh
bokeh copied to clipboard
multi_line broken when passing numpy arrays
bokeh version 1.4.0 python 3.7.0 jupyter notebook 6.0.2 macOS 10.15.2
Description of expected behavior and the observed behavior
This should work:
import bokeh.plotting
fig = bokeh.plotting.figure()
fig.multi_line(numpy.array([[1, 3, 2], [3, 4, 6]]), numpy.array([[2, 1, 4], [4, 7, 8]]))
However, it doesn't (see error message below).
For the record, the following code, without using numpy arrays, works without errors:
import bokeh.plotting
fig = bokeh.plotting.figure()
fig.multi_line(([[1, 3, 2], [3, 4, 6]]), ([[2, 1, 4], [4, 7, 8]]))
bokeh.io.show(fig)
Complete, minimal, self-contained example code that reproduces the issue
see above.
Stack traceback and/or browser JavaScript console output
Error produced with example above:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-60-d6c8a8fa1428> in <module>
1 import bokeh.plotting
2 fig = bokeh.plotting.figure()
----> 3 fig.multi_line(numpy.array([[1, 3, 2], [3, 4, 6]]), numpy.array([[2, 1, 4], [4, 7, 8]]))
fakesource in multiline(self, xs, ys, **kwargs)
~/anaconda3/envs/nteract/lib/python3.7/site-packages/bokeh/plotting/helpers.py in func(self, **kwargs)
890 glyph_ca = _pop_colors_and_alpha(glyphclass, kwargs)
891 incompatible_literal_spec_values = []
--> 892 incompatible_literal_spec_values += _process_sequence_literals(glyphclass, kwargs, source, is_user_source)
893 incompatible_literal_spec_values += _process_sequence_literals(glyphclass, glyph_ca, source, is_user_source)
894 if incompatible_literal_spec_values:
~/anaconda3/envs/nteract/lib/python3.7/site-packages/bokeh/plotting/helpers.py in _process_sequence_literals(glyphclass, kwargs, source, is_user_source)
373
374 if isinstance(val, np.ndarray) and val.ndim != 1:
--> 375 raise RuntimeError("Columns need to be 1D (%s is not)" % var)
376
377 if is_user_source:
RuntimeError: Columns need to be 1D (xs is not)
Screenshots or screencasts of the bug in action
@poke1024 This is not a bug, as there is not currently any expectation that things should work with numpy arrays. The glyphs requires "lists of lists" or "lists of arrays" (because in general, the sub-lines have no expectation to be the same length). One way to adapt things would be to use a list comprehension to make a list of arrays:
In [11]: a = np.array([[1,2], [3,4]])
In [12]: lst = [x[i:] for i, x in enumerate(a)]
In [13]: lst
Out[13]: [array([1, 2]), array([4])]
It might make sense, as a convenience, to perform this adaptation automatically internally. It is a nice pure Python task that would be good for a new contributor. Would you be interested to work on the feature?
Yes, why not.
Also I think it's good to support this, as the examples seem to imply some support of numpy like here: https://github.com/bokeh/bokeh/blob/c3e858b88a318ff4faa4c663b1f2c2cc53fec5ee/examples/plotting/file/lorenz.py#L32
I looked into this a bit. The whole problem is that the check in https://github.com/bokeh/bokeh/blob/c3e858b88a318ff4faa4c663b1f2c2cc53fec5ee/bokeh/plotting/helpers.py#L463 is not correct for MultiLine and MultiPolygon. For these cases, it needs to check against val.ndim != 2. With that fix, everything works fine.
There seem two obivous ways to fix this: (1) at location, by adding some hard-coded glyphclass in (MultiLine, MultiPolygon) check, or (2) by adding some new datadim property to https://github.com/bokeh/bokeh/blob/c3e858b88a318ff4faa4c663b1f2c2cc53fec5ee/bokeh/core/has_props.py#L232.
Also I think it's good to support this, as the examples seem to imply some support of numpy like here:
@poke1024 That example does what I suggested above: it provides a list of 1-d arrays:
In [15]: np.array_split(xprime, 7)
Out[15]:
[array([12.02081528, 11.56800288, 11.11943992, ..., 14.16587838,
14.91413732, 15.68758183]),
...
array([-17.12658567, -17.31135881, -17.47077918, ..., -22.26512893,
-22.36096228, -22.37965827])]
(1) at location, by adding some hard-coded glyphclass in (MultiLine, MultiPolygon) check, or (2) by adding some new datadim property to
I think that is a reasonable place for a dim check. It slightly surprises me that that is sufficient. My concern would be that in the Bokeh server app case, things falls back to the very inefficient JSON encoding, and not the binary transport. I would expect to need to also need explicit logic to split the 2-d array into a list of 1-d arrays somewhere too. We will need to investigate this after there is an initial PR submitted to work with. I can certainly help with that.
In any case, rather than coupling that code to special case glyph class types with an instance checks, I would suggest your "datadim" approach. However, the place to put that is simply new class attributes on the appropriate glyphs. The datadim value should specify exactly which properties can accept n-d array values:
-
MulitLine,MultiPolygonsandPatchesall havexsandysproperties and these are the properties that could uses 2-d arrays -
"Image" glyphs could potentially also accept 3-d arrays for the
imageproperty (their underlying type is a list of 2-d images*
So, e.g. maybe something like:
class MultiLine(...):
_accept_dims = {'xs': 2, 'ys': 2}
The help code can check glyphcalss._accept_dims and then anything not otherwise specified is assume to accept just ndim=1 by default.
There needs to be a documented convention for how the "split" is done. is a 2d array turned into a list of columns? Or a list of rows? Evidently something is already happening, we need to make sure to document exactly what, and also make sure it is maintained under test. It's possible this could be made configurable as well