data-explorer Data Explorer breaks when dataframe cell has complex data in it

Repro: run the following in a cell

import pandas as pd
pd.set_option("display.html.table_schema", True)

class Cmd:
    def __init__(self, name, params):
        self.name = name
        self.params = params
    def __repr__(self):
        return f'Cmd(name={self.name}, params={self.params})'

cell_payload = [
    Cmd(name='foo', params={'bar', 'baz'}),
    Cmd(name='foo', params={'bar', 'baz'})
]
pd.DataFrame({'param_session': [cell_payload]})

Then the following error appears (with a link to this error page, which mentions that the error was Objects are not valid as a React child (found: object with keys {name}). If you meant to render a collection of children, use an array instead.)

For reference, this is how Pandas would normally render the cell, when setting pd.set_option("display.html.table_schema", False)

Finally, here's what the output looks like in the ipynb file when the error occurs

            "application/vnd.dataresource+json": {
              "schema": {
                "fields": [
                  {
                    "name": "index",
                    "type": "integer"
                  },
                  {
                    "name": "param_session",
                    "type": "string"
                  }
                ],
                "primaryKey": [
                  "index"
                ],
                "pandas_version": "0.20.0"
              },
              "data": [
                {
                  "index": 0,
                  "param_session": [
                    {
                      "name": "foo"
                    },
                    {
                      "name": "foo"
                    }
                  ]
                }
              ]
            }
          },

Apr 07 '21 03:04 jruales

@emeeks Is this ringing any bells for you?

Apr 08 '21 22:04 captainsafia

From what I understand, the problem is that whatever values inside "data" in the output are being inserted as children into the React component from a cell, and the problem arises when the data is a dictionary.

So I'm thinking that currently, the

                  [
                    {
                      "name": "foo"
                    },
                    {
                      "name": "foo"
                    }
                  ]

is just being inlined in React, but probably should be turned into a string first before inlining

Apr 19 '21 09:04 jruales

@jruales Are you able to repro this with the raw data explorer component? I wonder if it has something to do with the way we wrap it in the output.

cc: @willingc

Apr 20 '21 18:04 captainsafia

I was able to reproduce @jruales's issue outside of Jupyter. The issue persists regardless when the schema type is set to object or array.

Demo: https://codesandbox.io/s/pedantic-hodgkin-78o80?file=/src/App.js:216-221

@emeeks what do you think about changing data-explorer to accept a column type of type object which stringifies the cell internally, vs asking callers of data-explorer to transform object cells into strings before passing them in? We have at least 2 options:

If the column is actually an array or object type per the Frictionless data spec, call JSON.stringify on it to avoid this React error when displaying these cells in tables. This will make the object value displayable in the table, but they won't be used in any of the actual visualizations. Somewhere in the python binding code, the field type should be changed from string to object or array.
Data explorer drops any frictionless spec column types that it doesn't recognize (e.g. just date/boolean/number/string) .

Jun 29 '21 01:06 hydrosquall

:rocket: Issue was released in v8.2.11 :rocket:

Jul 01 '21 01:07 github-actions[bot]

Reopening since while #65 fixes the issue for Javascript consumers when the schema type for these complex columns is set to object instead of string, but a separate fix (maybe a separate issue) needs to be applied to get the pandas code to set the column type correctly.

Jul 01 '21 01:07 hydrosquall

I tried to reproduce this issue in my local jupyterlab, but found it wasn't working with the latest version.

Image 2021-07-02 at 1 36 14 PM

I think the data-explorer package (which hasn't been updated in a year) is getting the data in from here, but I'm not sure how to track where the frictionless data spec is generated (perhaps it is coming from something in the Python code). Once we do, we'll want to find a way to get it to set the column type properly (Pandas has it correctly set as an object based on the screencap below)

Image 2021-07-02 at 1 45 48 PM

@jruales did you run into this issue while using Jupyter Lab or Jupyter Notebook?

Jul 02 '21 17:07 hydrosquall

I decided to have a look at the Pandas documentation, and found the root of the issue.

https://pandas.pydata.org/docs/user_guide/io.html#table-schema

The column type for a Pandas object column is set to a Frictionless spec string rather than an Object.

https://sourcegraph.com/github.com/pandas-dev/pandas@dad3e7fc3a2a75ba5f330899be0639cff0f73f6c/-/blob/pandas/io/json/_table_schema.py?L62-89

I think we actually want this to be returning a Frictionless object instead.

https://sourcegraph.com/github.com/pandas-dev/pandas@dad3e7fc3a2a75ba5f330899be0639cff0f73f6c/-/blob/pandas/core/dtypes/common.py?L532-571

During the serialization/deserialization process to Jupyter, the string contents were turned back into a JSON object, as it's no longer a string by the time it reaches the data-explorer. There also wasn't metadata that can be used to differentiate what was originally a string from a list of Python objects. Related reading about strings and objects

df = DataFrame(
            {
                "A": ["a", "b", "c"],
                "B": [{ "a": 1}, { "b": 1}, { "c": 1}]
            }
        )
col_types = df.dtypes
# strings and object columns are treated the same way in Pandas
col_types[0] == col_types[1] # this returns true :(

This issue was brought up when Table Schema was implemented in Pandas, but ultimately object ultimately didn't get supported as a special data type. https://github.com/pandas-dev/pandas/pull/14904#discussion_r99501336

There might be a "sniffing heuristic" that we could apply at the Javascript or Python level, where if a column is labeled as a string at the Frictionless level, but actually contains JSON objects in each single cell, we could treat the column as Frictionless spec object instead.

Jul 05 '21 23:07 hydrosquall