dash-docs
dash-docs copied to clipboard
Document how to convert a pandas multi-index dataframe into a DataTable
Multi-index is the most 👍 issue in https://github.com/plotly/dash-table/issues/414. For full support, it'd be great if we had something like merged cells in the datatable. Until then, let's document how to render this data with cells that are "filled in"
Taking from: https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-hierarchical
In [1]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
...: ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
...:
In [2]: tuples = list(zip(*arrays))
In [3]: tuples
Out[3]:
[('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')]
In [4]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
In [5]: index
Out[5]:
MultiIndex([('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')],
names=['first', 'second'])
In [6]: s = pd.Series(np.random.randn(8), index=index)
In [7]: s
Out[7]:
first second
bar one 0.469112
two -0.282863
baz one -1.509059
two -1.135632
foo one 1.212112
two -0.173215
qux one 0.119209
two -1.044236
dtype: float64
in s above, we can show how two methods:
- Repeating "bar" on each row
- Including "bar" on the first row but then using empty strings on the rows below it
As a workaround I am converting the df using the function below. I don't recommend using empty strings because you will lose sort functionality. If you sort you end up with bunch of empty cells that you don't know belonged to which group. This is not the best possible solution or a well written code but does the job. Help me improve it :) :
def df_to_table(df: pd.DataFrame,
id: str = 'table',
col_with_min: str = '80px',
col_with_max: str = '120px',
row_deletable: bool = True,
row_selectable: str = 'single',
exclude_cols: list = [],
cell_conditional_style: list = [],
data_conditional_style: list = [],
style_table: dict = None,
export: bool = False,
page_size: int = 10,
pct_cols: list = None):
"""
Converts a dataframe to dash compatible HTML table
:param df:
:param id:
:param col_with_min:
:param col_with_max:
:param row_deletable:
:param row_selectable:
:param exclude_cols:
:param cell_conditional_style:
:param data_conditional_style:
:param style_table:
:param export:
:param page_size:
:return:
"""
is_multi_index = isinstance(df.columns, pd.core.indexes.multi.MultiIndex)
_id = 'id'
if is_multi_index:
levels = len(df.columns.to_list()[0])
_id = tuple([''] * (levels - 1) + ['id'])
if isinstance(exclude_cols, list):
_exclude_cols = exclude_cols + [_id]
else:
logger.warning(
'Excluded columns should be a list of columns names! The provided value ignored!'
)
_exclude_cols = [_id]
_df = df.copy()
if _id not in df.columns:
_df[_id] = df.index
style_cell_conditional = []
for col in df.columns:
if col == 'id':
continue
if _df[col].dtype == 'O':
style_cell_conditional.append({
'if': {
'column_id': f'{col[-1] if is_multi_index else col}'
},
'textAlign': 'left',
'height': 'auto',
'minHeight': '10px',
# all three widths are needed
'minWidth': col_with_min,
'width': 'auto',
'maxWidth': col_with_max,
'whiteSpace': 'normal'
})
else:
style_cell_conditional.append({
'if': {
'column_id': f'{col[-1] if is_multi_index else col}'
},
'textAlign': 'right',
'whiteSpace': 'normal',
'height': 'auto',
'minHeight': '10px',
# all three widths are needed
'minWidth': col_with_min,
'width': 'auto',
'maxWidth': col_with_max,
})
style_cell_conditional = style_cell_conditional + cell_conditional_style
def set_type(col):
if _df[col].dtype == 'O':
return 'text'
if _df[col].dtype in ('float64', 'int64'):
return 'numeric'
if _df[col].dtype in ('datetime64[ns]'):
return 'datetime'
if is_multi_index:
col_names = [{
"name": list(i),
"id": i[-1],
'type': set_type(i),
"format": Format(
nully='N/A',
precision=2,
scheme=Scheme.fixed,
sign=Sign.positive,
symbol=Symbol.yes,
symbol_suffix='%') if i in pct_cols or '%' in i[-1] else ''
} for i in _df.columns if i not in _exclude_cols]
def row_to_dict(row):
_d = {}
for idx in row.index:
_d[idx[-1]] = row[idx]
return _d
_data = _df.apply(lambda row: row_to_dict(row), axis=1).to_list()
else:
col_names = [{
"name": i,
"id": i,
'type': set_type(i),
"format": Format(
nully='N/A',
precision=2,
scheme=Scheme.fixed,
sign=Sign.positive,
symbol=Symbol.yes,
symbol_suffix='%') if i in pct_cols or '%' in i else ''
} for i in _df.columns if i not in _exclude_cols]
_data = _df.to_dict('records')
style_header = {
'textAlign': 'center',
'fontWeight': 'bold',
'backgroundColor': 'rgb(190,190,190)',
}
style_data_conditional = [{
'if': {
'row_index': 'odd'
},
'backgroundColor': 'rgb(248, 248, 248)',
'whiteSpace': 'normal'
}]
style_data_conditional = style_data_conditional + data_conditional_style
kwargs = dict()
if export:
kwargs = dict(export_columns='all',
export_format='xlsx',
export_headers='display',
style_table={} if style_table is None else style_table)
return ddt.DataTable(
id=id,
columns=col_names,
data=_data,
page_size=page_size,
style_cell_conditional=style_cell_conditional,
merge_duplicate_headers=True,
# style_as_list_view=True,
style_header=style_header,
style_data_conditional=style_data_conditional,
filter_action="native",
sort_action="native",
sort_mode="multi",
row_deletable=row_deletable,
row_selectable=row_selectable,
**kwargs)
However, I am forced to move away from dash due to some restrictions in exporting data tables, and managing routes. I also hit a performance issue when the number of graphs and users increased. I switched to Flask. Still using Plotly.