Data Explorer breaks when dataframe cell has complex data in it
Repro: run the following in a cell
import pandas as pd
pd.set_option("display.html.table_schema", True)
class Cmd:
def __init__(self, name, params):
self.name = name
self.params = params
def __repr__(self):
return f'Cmd(name={self.name}, params={self.params})'
cell_payload = [
Cmd(name='foo', params={'bar', 'baz'}),
Cmd(name='foo', params={'bar', 'baz'})
]
pd.DataFrame({'param_session': [cell_payload]})
Then the following error appears (with a link to this error page, which mentions that the error was Objects are not valid as a React child (found: object with keys {name}). If you meant to render a collection of children, use an array instead.)

For reference, this is how Pandas would normally render the cell, when setting pd.set_option("display.html.table_schema", False)

Finally, here's what the output looks like in the ipynb file when the error occurs
"application/vnd.dataresource+json": {
"schema": {
"fields": [
{
"name": "index",
"type": "integer"
},
{
"name": "param_session",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "0.20.0"
},
"data": [
{
"index": 0,
"param_session": [
{
"name": "foo"
},
{
"name": "foo"
}
]
}
]
}
},
@emeeks Is this ringing any bells for you?
From what I understand, the problem is that whatever values inside "data" in the output are being inserted as children into the React component from a cell, and the problem arises when the data is a dictionary.
So I'm thinking that currently, the
[
{
"name": "foo"
},
{
"name": "foo"
}
]
is just being inlined in React, but probably should be turned into a string first before inlining
@jruales Are you able to repro this with the raw data explorer component? I wonder if it has something to do with the way we wrap it in the output.
cc: @willingc
I was able to reproduce @jruales's issue outside of Jupyter. The issue persists regardless when the schema type is set to object or array.
Demo: https://codesandbox.io/s/pedantic-hodgkin-78o80?file=/src/App.js:216-221
@emeeks what do you think about changing data-explorer to accept a column type of type object which stringifies the cell internally, vs asking callers of data-explorer to transform object cells into strings before passing them in? We have at least 2 options:
- If the column is actually an
arrayorobjecttype per the Frictionless data spec, callJSON.stringifyon it to avoid this React error when displaying these cells in tables. This will make the object value displayable in the table, but they won't be used in any of the actual visualizations. Somewhere in the python binding code, the field type should be changed fromstringtoobjectorarray. - Data explorer drops any frictionless spec column types that it doesn't recognize (e.g. just date/boolean/number/string) .
:rocket: Issue was released in v8.2.11 :rocket:
Reopening since while #65 fixes the issue for Javascript consumers when the schema type for these complex columns is set to object instead of string, but a separate fix (maybe a separate issue) needs to be applied to get the pandas code to set the column type correctly.
I tried to reproduce this issue in my local jupyterlab, but found it wasn't working with the latest version.

I think the data-explorer package (which hasn't been updated in a year) is getting the data in from here, but I'm not sure how to track where the frictionless data spec is generated (perhaps it is coming from something in the Python code). Once we do, we'll want to find a way to get it to set the column type properly (Pandas has it correctly set as an object based on the screencap below)

@jruales did you run into this issue while using Jupyter Lab or Jupyter Notebook?
I decided to have a look at the Pandas documentation, and found the root of the issue.
https://pandas.pydata.org/docs/user_guide/io.html#table-schema
The column type for a Pandas object column is set to a Frictionless spec string rather than an Object.
https://sourcegraph.com/github.com/pandas-dev/pandas@dad3e7fc3a2a75ba5f330899be0639cff0f73f6c/-/blob/pandas/io/json/_table_schema.py?L62-89
I think we actually want this to be returning a Frictionless object instead.
https://sourcegraph.com/github.com/pandas-dev/pandas@dad3e7fc3a2a75ba5f330899be0639cff0f73f6c/-/blob/pandas/core/dtypes/common.py?L532-571
During the serialization/deserialization process to Jupyter, the string contents were turned back into a JSON object, as it's no longer a string by the time it reaches the data-explorer. There also wasn't metadata that can be used to differentiate what was originally a string from a list of Python objects. Related reading about strings and objects
df = DataFrame(
{
"A": ["a", "b", "c"],
"B": [{ "a": 1}, { "b": 1}, { "c": 1}]
}
)
col_types = df.dtypes
# strings and object columns are treated the same way in Pandas
col_types[0] == col_types[1] # this returns true :(
This issue was brought up when Table Schema was implemented in Pandas, but ultimately object ultimately didn't get supported as a special data type.
https://github.com/pandas-dev/pandas/pull/14904#discussion_r99501336
There might be a "sniffing heuristic" that we could apply at the Javascript or Python level, where if a column is labeled as a string at the Frictionless level, but actually contains JSON objects in each single cell, we could treat the column as Frictionless spec object instead.