jupyterlab_voyager
jupyterlab_voyager copied to clipboard
Open table in Voyager option producing incorrect data
This issue concerns the Open Table in Voyager option.
Here is the code to reproduce this issue.

Currently, with the code from above, I get this result when I right click the pandas dataframe and click the Open Table in Voyager option.

The data for the related views is not correct, and should look more like this, (just open cars.json with Voyager)

As you can see, the charts are different in that the data is inconsistent between the two, which should be the same! For instance, the Origin vs Number of Records barchart displayed from the pandas dataframe route is different than the one from opening a json file in Voyager
Good catch!
On Thu, Jul 5, 2018 at 3:35 PM Shaheen Sharifian [email protected] wrote:
This issue concerns the Open Table in Voyager option.
Here is the code to reproduce this issue.
[image: image] https://user-images.githubusercontent.com/9298611/42351178-83dad3fa-8068-11e8-9524-a096ae8afa4e.png
Currently, with the code from above, I get this result when I right click the pandas dataframe and click the Open Table in Voyager option.
[image: image] https://user-images.githubusercontent.com/9298611/42351187-934c6b14-8068-11e8-90b5-418056d90c8e.png
The data for the related views is not correct, and should look more like this, (just open cars.json with Voyager)
[image: image] https://user-images.githubusercontent.com/9298611/42351220-b9f18326-8068-11e8-9993-4cb2efd49110.png
As you can see, the charts are different in that the data is inconsistent between the two, which should be the same! For instance, the Origin vs Number of Records barchart displayed from the pandas dataframe route is different than the one from opening a json file in Voyager
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/altair-viz/jupyterlab_voyager/issues/51, or mute the thread https://github.com/notifications/unsubscribe-auth/AABr0FTqmHkDhdLq5zbsoXr8uVN7KA4kks5uDpSpgaJpZM4VEofk .
-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub [email protected] and [email protected]
Thanks, @ssharif6 , I'll look up into it, maybe the underneath data-frame we extract has a different structure?
Hi, @ssharif6 , it turns out to be an issue about panda dataframe's default setting. if not specified, pd will only display the first 30 rows + the last 30 rows, so JupyterLab_Voyager will only be able to get this 60 rows of data instead of the full dataset (that's why you see a difference).
Without major changes, to extract the whole dataset from this 'partial' table is almost impossible. So, an easy solution would be just changing the panda settings:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
In this way, the whole frame is displayed, and 'Open Table in Voyager' will get the correct data.
Is Open Table in Voyager just scraping the HTML output table? If so, I don't think that's an appropriate method to get the underlying data.
You can't simply change the display of the output because printing large DataFrames will then effectively hang your kernel session.
True, that's definitely not a good solution. But currently Jupyter notebook doesn't expose the source dataset in cell output, for this extension, we don't have an easy way to directly find and access the data unless we modify the notebook itself to add some APIs.
^^^ I think this is a general design question that Jupyter needs to answer properly.
I guess a big aspect of this is how you communicate the data from Python/R/whatever objects to the frontend in javascript and had thought arrow might fit the bill there.
IMHO It would be great if the Open in Voyager context menu could be integrated with the variable inspector. They're already doing something similar by providing a phosphor datagrid view for numpy arrays. Although it's not yet an officially supported extension I think it would make sense for it to be in future - it's one of the highest request items from my users.