dash-docs icon indicating copy to clipboard operation
dash-docs copied to clipboard

Document how to convert a pandas multi-index dataframe into a DataTable

Open chriddyp opened this issue 5 years ago • 1 comments

Multi-index is the most 👍 issue in https://github.com/plotly/dash-table/issues/414. For full support, it'd be great if we had something like merged cells in the datatable. Until then, let's document how to render this data with cells that are "filled in"

Taking from: https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-hierarchical

In [1]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
   ...:           ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
   ...: 

In [2]: tuples = list(zip(*arrays))

In [3]: tuples
Out[3]: 
[('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

In [4]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

In [5]: index
Out[5]: 
MultiIndex([('bar', 'one'),
            ('bar', 'two'),
            ('baz', 'one'),
            ('baz', 'two'),
            ('foo', 'one'),
            ('foo', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

In [6]: s = pd.Series(np.random.randn(8), index=index)

In [7]: s
Out[7]: 
first  second
bar    one       0.469112
       two      -0.282863
baz    one      -1.509059
       two      -1.135632
foo    one       1.212112
       two      -0.173215
qux    one       0.119209
       two      -1.044236
dtype: float64

in s above, we can show how two methods:

  1. Repeating "bar" on each row
  2. Including "bar" on the first row but then using empty strings on the rows below it

chriddyp avatar Aug 27 '20 23:08 chriddyp

As a workaround I am converting the df using the function below. I don't recommend using empty strings because you will lose sort functionality. If you sort you end up with bunch of empty cells that you don't know belonged to which group. This is not the best possible solution or a well written code but does the job. Help me improve it :) :

def df_to_table(df: pd.DataFrame,
                id: str = 'table',
                col_with_min: str = '80px',
                col_with_max: str = '120px',
                row_deletable: bool = True,
                row_selectable: str = 'single',
                exclude_cols: list = [],
                cell_conditional_style: list = [],
                data_conditional_style: list = [],
                style_table: dict = None,
                export: bool = False,
                page_size: int = 10,
                pct_cols: list = None):
    """
    Converts a dataframe to dash compatible HTML table

    :param df:
    :param id:
    :param col_with_min:
    :param col_with_max:
    :param row_deletable:
    :param row_selectable:
    :param exclude_cols:
    :param cell_conditional_style:
    :param data_conditional_style:
    :param style_table:
    :param export:
    :param page_size:
    :return:
    """
    is_multi_index = isinstance(df.columns, pd.core.indexes.multi.MultiIndex)

    _id = 'id'
    if is_multi_index:
        levels = len(df.columns.to_list()[0])
        _id = tuple([''] * (levels - 1) + ['id'])

    if isinstance(exclude_cols, list):
        _exclude_cols = exclude_cols + [_id]
    else:
        logger.warning(
            'Excluded columns should be a list of columns names! The provided value ignored!'
        )
        _exclude_cols = [_id]

    _df = df.copy()

    if _id not in df.columns:
        _df[_id] = df.index

    style_cell_conditional = []
    for col in df.columns:
        if col == 'id':
            continue
        if _df[col].dtype == 'O':
            style_cell_conditional.append({
                'if': {
                    'column_id': f'{col[-1] if is_multi_index else col}'
                },
                'textAlign': 'left',
                'height': 'auto',
                'minHeight': '10px',
                # all three widths are needed
                'minWidth': col_with_min,
                'width': 'auto',
                'maxWidth': col_with_max,
                'whiteSpace': 'normal'
            })
        else:
            style_cell_conditional.append({
                'if': {
                    'column_id': f'{col[-1] if is_multi_index else col}'
                },
                'textAlign': 'right',
                'whiteSpace': 'normal',
                'height': 'auto',
                'minHeight': '10px',
                # all three widths are needed
                'minWidth': col_with_min,
                'width': 'auto',
                'maxWidth': col_with_max,
            })

    style_cell_conditional = style_cell_conditional + cell_conditional_style

    def set_type(col):
        if _df[col].dtype == 'O':
            return 'text'
        if _df[col].dtype in ('float64', 'int64'):
            return 'numeric'
        if _df[col].dtype in ('datetime64[ns]'):
            return 'datetime'

    if is_multi_index:
        col_names = [{
            "name": list(i),
            "id": i[-1],
            'type': set_type(i),
            "format": Format(
                nully='N/A',
                precision=2,
                scheme=Scheme.fixed,
                sign=Sign.positive,
                symbol=Symbol.yes,
                symbol_suffix='%') if i in pct_cols or '%' in i[-1] else ''
        } for i in _df.columns if i not in _exclude_cols]

        def row_to_dict(row):
            _d = {}
            for idx in row.index:
                _d[idx[-1]] = row[idx]
            return _d

        _data = _df.apply(lambda row: row_to_dict(row), axis=1).to_list()
    else:
        col_names = [{
            "name": i,
            "id": i,
            'type': set_type(i),
            "format": Format(
                nully='N/A',
                precision=2,
                scheme=Scheme.fixed,
                sign=Sign.positive,
                symbol=Symbol.yes,
                symbol_suffix='%') if i in pct_cols or '%' in i else ''
        } for i in _df.columns if i not in _exclude_cols]
        _data = _df.to_dict('records')

    style_header = {
        'textAlign': 'center',
        'fontWeight': 'bold',
        'backgroundColor': 'rgb(190,190,190)',
    }

    style_data_conditional = [{
        'if': {
            'row_index': 'odd'
        },
        'backgroundColor': 'rgb(248, 248, 248)',
        'whiteSpace': 'normal'
    }]
    style_data_conditional = style_data_conditional + data_conditional_style
    kwargs = dict()
    if export:
        kwargs = dict(export_columns='all',
                      export_format='xlsx',
                      export_headers='display',
                      style_table={} if style_table is None else style_table)

    return ddt.DataTable(
        id=id,
        columns=col_names,
        data=_data,
        page_size=page_size,
        style_cell_conditional=style_cell_conditional,
        merge_duplicate_headers=True,
        # style_as_list_view=True,
        style_header=style_header,
        style_data_conditional=style_data_conditional,
        filter_action="native",
        sort_action="native",
        sort_mode="multi",
        row_deletable=row_deletable,
        row_selectable=row_selectable,
        **kwargs)

However, I am forced to move away from dash due to some restrictions in exporting data tables, and managing routes. I also hit a performance issue when the number of graphs and users increased. I switched to Flask. Still using Plotly.

shsab avatar Oct 30 '20 16:10 shsab