altair icon indicating copy to clipboard operation
altair copied to clipboard

Large datasets not working in v5.1.2

Open bchastain opened this issue 2 years ago • 14 comments

Following this example from the docs about dealing with large datasets:

import altair as alt
import pandas as pd

data = pd.DataFrame({"x": range(10000)})
alt.data_transformers.disable_max_rows()
alt.Chart(data).mark_point()

This used to work in old versions of altair, but in 5.1.2 in Jupyter it gives this error:

Javascript Error: Cannot read properties of undefined (reading 'shape') This usually means there's a typo in your chart specification. See the javascript console for the full traceback.

bchastain avatar Oct 10 '23 19:10 bchastain

Thanks for reporting this! I tried using your code just now in Safari and in Chrome and didn't have a problem. Have you tried opening a new notebook (maybe after clearing the browser cache) and trying again? Are you using Jupyter Lab or Jupyter Notebook?

ChristopherDavisUCI avatar Oct 10 '23 20:10 ChristopherDavisUCI

Ah strange, yes I tried in different browsers. This is jupyter notebook, Python 3.9.13, and the following jupyter core package versions: IPython : 7.31.1 ipykernel : 6.15.2 ipywidgets : 7.6.5 jupyter_client : 7.3.4 jupyter_core : 4.11.1 jupyter_server : 1.18.1 jupyterlab : 3.4.4 nbclient : 0.5.13 nbconvert : 6.4.4 nbformat : 5.5.0 notebook : 6.4.12 qtconsole : 5.2.2 traitlets : 5.1.1

bchastain avatar Oct 10 '23 22:10 bchastain

It works for me too. One thing to try is to close down all jupyter notebooks with an altair chart and then reopen jupyter lab and try again.

joelostblom avatar Oct 10 '23 23:10 joelostblom

@joelostblom's suggestion has also worked for me in the past, although I would have thought opening in different browsers would provide something similar...

I compared my version numbers and many of them are slightly higher: ipykernel 6.25.2 ipython 8.15.0 jupyter_client 8.3.1 jupyter_core 5.3.1 jupyterlab 4.0.5 nbclient 0.8.0 nbconvert 7.8.0 nbformat 5.9.2 notebook 7.0.3 traitlets 5.10.0

If it turns out that Altair is not compatible with something in your system, it would be good if we could learn that for future reference!

ChristopherDavisUCI avatar Oct 10 '23 23:10 ChristopherDavisUCI

I just set up a new conda env on a completely separate machine with the following versions: IPython : 8.16.1 ipykernel : 6.25.2 ipywidgets : 8.1.1 jupyter_client : 8.3.1 jupyter_core : 5.3.1 jupyter_server : 2.7.3 jupyterlab : 4.0.6 nbclient : 0.8.0 nbconvert : 7.9.2 nbformat : 5.9.2 notebook : 7.0.4 qtconsole : 5.4.4 traitlets : 5.11.2

altair :5.1.2 pandas :2.1.1 python :3.12.0

and still get the same error as before running the above sample code.

bchastain avatar Oct 11 '23 04:10 bchastain

Can you open up the javascript console (F12 in most browsers) and see if there is any additional information there?

joelostblom avatar Oct 11 '23 04:10 joelostblom

Gives the same error as in Jupyter with a bit of a trace: Uncaught (in promise) Javascript Error: Cannot read properties of undefined (reading 'shape')
This usually means there's a typo in your chart specification. See the javascript console for the full traceback.

Promise.catch (async) displayChart @ VM33:39 execCb @ require.js?v=d37b48b…cc60411154f593:1693 check @ require.js?v=d37b48b…5cc60411154f593:881 enable @ require.js?v=d37b48b…cc60411154f593:1173 init @ require.js?v=d37b48b…5cc60411154f593:786 (anonymous) @ require.js?v=d37b48b…cc60411154f593:1457 setTimeout (async) req.nextTick @ require.js?v=d37b48b…cc60411154f593:1812 localRequire @ require.js?v=d37b48b…cc60411154f593:1446 requirejs @ require.js?v=d37b48b…cc60411154f593:1794 (anonymous) @ VM33:44 (anonymous) @ VM33:52 b @ jquery.min.js:2 Pe @ jquery.min.js:2 append @ jquery.min.js:2 OutputArea._safe_append @ outputarea.js:458 OutputArea.append_execute_result @ outputarea.js:497 OutputArea.append_output @ outputarea.js:325 OutputArea.handle_output @ outputarea.js:256 output @ codecell.js:399 Kernel._handle_output_message @ kernel.js:1199 i @ jquery.min.js:2 Kernel._handle_iopub_message @ kernel.js:1239 Kernel._finish_ws_message @ kernel.js:1018 (anonymous) @ kernel.js:1009 Promise.then (async) Kernel._handle_ws_message @ kernel.js:1009 i @ jquery.min.js:2

bchastain avatar Oct 11 '23 14:10 bchastain

If your notebook contains charts in cell outputs from previous Altair versions, then an old version of Vega-Lite might be loaded. Restarting Jupyter lab/notebook or switching browsers might not be enough to resolve this. Could you try in the following order:

  1. Clear all cell outputs in your notebook
  2. Restart Jupyter lab/notebook
  3. Clear your browser cache
  4. Run the notebook again

binste avatar Oct 12 '23 19:10 binste

It's a new notebook with the above code as literally the only thing in it. I've tried on multiple computers and browsers.

bchastain avatar Oct 12 '23 19:10 bchastain

Hmm, I am not sure what is going wrong here. You mentioned that other version of altair worked, if you downgrade it or create a new env with a lower version, does it work again for you? Does other example with disable_max_rows() work? Does it work to use the vegafusion data transformer instead(from the same doc page)?

joelostblom avatar Oct 12 '23 22:10 joelostblom

Hm well actually I thought it worked in a previous version, but to be honest I can't say 100% that I had tested it explicitly like this in a previous version. Is there another example using disable_max_rows() I should try?

bchastain avatar Oct 17 '23 14:10 bchastain

You could try it with any of the examples in the gallery both those with data that has less than and more than 5k rows to understand exactly what is failing. The flight datasets exists in different versions with different number of rows https://altair-viz.github.io/gallery/histogram_responsive.html or you could just concatenate any of the datasets together to get over and under 5k

joelostblom avatar Oct 17 '23 17:10 joelostblom

OK maybe I'm completely miscategorizing this issue - maybe it's not a large dataset issue at all, but rather an issue with that sample code. Does the sample code I wrote originally above work for you guys? If I modify it to be something like this, it works fine, but just not in the original definition without encodings:

`import altair as alt import pandas as pd import numpy as np x = np.arange(10000) data = pd.DataFrame({'x': x, 'f(x)': np.sin(x / 5)})

alt.data_transformers.disable_max_rows() alt.Chart(data).mark_line().encode( x='x', y='f(x)' )`

bchastain avatar Oct 18 '23 01:10 bchastain

I'm happy it works for you now @bchastain !

The sample code works for me, both when disabling max rows and when using less data without max rows disabled. You should see a single point (or technically many on top of each other) like this:

image

joelostblom avatar Oct 18 '23 01:10 joelostblom