plotly.py icon indicating copy to clipboard operation
plotly.py copied to clipboard

Fix for "Non-leaves rows are not permitted in the dataframe" with sunburst diagrams

Open edent opened this issue 3 years ago • 1 comments

It is sometimes useful to have non-leaf data in a Sunburst diagram. However, there is no way to tell Plotly Express to ignore or accept non-leaves.

Minimum viable example:

import pandas as pd
import plotly.express as px
lst = [['Alice', "Bob"], ['Alice', "Bob", "Carrie"], ["Alice", "Bob", "Chuck"]]
df = pd.DataFrame(lst)
fig = px.sunburst(df, path=df.columns)

Gives the error:

ValueError: ('Non-leaves rows are not permitted in the dataframe \n', 0 Alice 1 Bob 2
Name: 0, dtype: object, 'is not a leaf.')

This can be fixed by commenting out part of plotly/express/_core.py

def _check_dataframe_all_leaves(df):
    df_sorted = df.sort_values(by=list(df.columns))
    null_mask = df_sorted.isnull()
    df_sorted = df_sorted.astype(str)
    null_indices = np.nonzero(null_mask.any(axis=1).values)[0]
    for null_row_index in null_indices:
        row = null_mask.iloc[null_row_index]
        i = np.nonzero(row.values)[0][0]
        if not row[i:].all():
            raise ValueError(
                "None entries cannot have not-None children",
                df_sorted.iloc[null_row_index],
            )
    df_sorted[null_mask] = ""
    row_strings = list(df_sorted.apply(lambda x: "".join(x), axis=1))
    #for i, row in enumerate(row_strings[:-1]):
        #if row_strings[i + 1] in row and (i + 1) in null_indices:
            #raise ValueError(
            #    "Non-leaves rows are not permitted in the dataframe \n",
            #    df_sorted.iloc[i + 1],
            #    "is not a leaf.",
            #)

It would be great if px.sunburst could have an option to disable these checks, or to skip over any row which is not a leaf.

How can I propose this as an option?

Thanks!

edent avatar Feb 08 '22 16:02 edent

Just for anyone having also this issue, a workaround I used was replacing any None of the by "null" (or any desired string) and then remove the corresponding null values from the figure data before ploting:

        df = df.applymap(lambda x: x if x else "null")  # There are better approaches, only for clarity of preprocess
        fig = px.icicle(
            df,
            path=df.columns,
        )
        figure_data = fig["data"][0]

        mask = np.char.find(figure_data.ids.astype(str), "null") == -1
        figure_data.ids = figure_data.ids[mask]
        figure_data.values = figure_data.values[mask]
        figure_data.labels = figure_data.labels[mask]
        figure_data.parents = figure_data.parents[mask]

lluissalord avatar Sep 06 '22 10:09 lluissalord

Just for anyone having also this issue, a workaround I used was replacing any None of the by "null" (or any desired string) and then remove the corresponding null values from the figure data before ploting:

        df = df.applymap(lambda x: x if x else "null")  # There are better approaches, only for clarity of preprocess
        fig = px.icicle(
            df,
            path=df.columns,
        )
        figure_data = fig["data"][0]

        mask = np.char.find(figure_data.ids.astype(str), "null") == -1
        figure_data.ids = figure_data.ids[mask]
        figure_data.values = figure_data.values[mask]
        figure_data.labels = figure_data.labels[mask]
        figure_data.parents = figure_data.parents[mask]

thanks so much!

nomeutentepippo avatar Feb 02 '23 14:02 nomeutentepippo