plotly.py icon indicating copy to clipboard operation
plotly.py copied to clipboard

Sunburst and Treemap graph does not render with `branchvalues="total"`

Open ProphetLamb opened this issue 1 year ago • 3 comments

Good evening everyone, I was writing up a small article, when I encountered an error with the branchvalues="total" setting on px.sunburst as well as px.treemap.

The bug being, the plot renders whitespace only.

Changing the branchvalues parameter back to remainder renders the graph as expected. But since I am summing up the views, the format is invalid.

Can anyone reproduce these results? Any ideas what I can do to circumvent this issue?

Thanks in advance! Cheers

Minimum working sample

Treemap branchvalues="total"

Treemap branchvalues="total"

Treemap branchvalues="remainder"

Treemap branchvalues="remainder"

Sunburst branchvalues="total"

Sunburst branchvalues="total"

Sunburst branchvalues="remainder"

Sunburst branchvalues="remainder"

Intermediate Dataset

path score views parent
music/pop/jackson/billie_jean.mp3 0.8000 1000 music/pop/jackson
music/pop/jackson/beat_it.mp3 0.9000 2000 music/pop/jackson
music/pop/abba/dancing_queen.mp3 0.7000 1500 music/pop/abba
music/pop/abba/voulez-vous/voulez-vous.mp3 0.7500 1500 music/pop/abba/voulez-vous
music/pop/abba/voulez-vous/summer_night_city.mp3 0.8000 1500 music/pop/abba/voulez-vous
music/pop/abba/waterloo.mp3 0.8000 1500 music/pop/abba
music/pop/abba/chiquitita.mp3 0.7000 1500 music/pop/abba
music/pop/abba/s.o.s.mp3 0.7000 1500 music/pop/abba
music/rock/queen/bohemian_rhapsody.mp3 0.9000 3000 music/rock/queen
music/pop/abba 0.7000 6000 music/pop
music/pop/abba/voulez-vous 0.7750 3000 music/pop/abba
music/pop/jackson 0.8500 3000 music/pop
music/rock/queen 0.9000 3000 music/rock
music/pop 0.7750 9000 music
music/rock 0.9000 3000 music
music 0.8375 12000 None

Source code

# %%
import typing as t
import pandas as pd
import plotly.express as px

# %%
data = data = pd.DataFrame([ 
  { 'path': 'music/pop/jackson/billie_jean.mp3', 'score': 0.8, 'views': 1000 },
  { 'path': 'music/pop/jackson/beat_it.mp3', 'score': 0.9, 'views': 2000 },
  { 'path': 'music/pop/abba/dancing_queen.mp3', 'score': 0.7, 'views': 1500 },

  { 'path': 'music/pop/abba/voulez-vous/voulez-vous.mp3', 'score': 0.75, 'views': 1500 },
  { 'path': 'music/pop/abba/voulez-vous/summer_night_city.mp3', 'score': 0.8, 'views': 1500 },
  { 'path': 'music/pop/abba/waterloo.mp3', 'score': 0.8, 'views': 1500 },
  { 'path': 'music/pop/abba/chiquitita.mp3', 'score': 0.7, 'views': 1500 },
  { 'path': 'music/pop/abba/s.o.s.mp3', 'score': 0.7, 'views': 1500 },  
  { 'path': 'music/rock/queen/bohemian_rhapsody.mp3', 'score': 0.9, 'views': 3000 },
])

# %%

col_path: str = 'path'
col_parent: str = 'parent'
def path_parent_fn(path):
  path = path.split('/')
  path = '/'.join(path[:-1]) if len(path) > 0 else ''
  path = path.strip()
  return path if len(path) > 0 else None 

aggregation = { 'score': 'median', 'views': 'sum' }


# %%
PathT = t.TypeVar('PathT')
Axis = t.Union[int, str]

def create_hierarchy_data(
  data: pd.DataFrame,
  col_path: Axis,
  col_parent: Axis,
  path_parent_fn: t.Callable[[PathT], t.Union[PathT, None]],
  aggregation: t.Any
):
  data[col_parent] = data[col_path].apply(path_parent_fn)

  def parent_in_data_or_na():
    return data[col_parent].isin(data[col_path]) | data[col_parent].isna()
  
  while not parent_in_data_or_na().all(skipna=True):
    missing_parents = data[data[col_parent].isin(data[col_path]) == False]
    missing_parents = missing_parents.groupby(col_parent, as_index=False)
    missing_parents_keys = missing_parents.groups.keys()
    missing_parents = missing_parents.agg(aggregation)
    missing_parents[col_path] = missing_parents_keys
    missing_parents[col_parent] = missing_parents[col_path].apply(path_parent_fn)
    data = pd.concat([
      data,
      missing_parents
    ], ignore_index=True)
  data = data[data[col_path].isna() == False]
  return data


# %%
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
data

# %%

data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.treemap(data, names=col_path, parents=col_parent, values='views', color='score', color_continuous_midpoint=0.5, branchvalues='total')
fig

# %%
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.sunburst(data, names=col_path, parents=col_parent, values='views', color='score', color_continuous_midpoint=0.5, branchvalues='total')
fig



ProphetLamb avatar Aug 02 '23 22:08 ProphetLamb

The only method to render this data, using branchvalues='remainder' and zero values in all nodes but leaves.

Applying this fix yields an aggregation function, where every computed value is zero:

aggregation = { 'score': 'median', 'views': 'sum', 'layout_value': lambda x: 0 }

The data['layout_value'] column is initialized from the current value column data['views'] and then used instead of value when rendering the figure:

data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.treemap(data, names=col_path, parents=col_parent, values='layout_value', color='score', color_continuous_midpoint=0.5, branchvalues='remainder')
fig

This figure has a zero size for all generated nodes, but the branchvalues='remainder' ensure, that the generated parent node inherits the size of the child nodes.

Zero remainder treemap

Finally, rendering a custom hover and using the real value, instead of the plotly workaround value, yields a beautiful graph.

data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.treemap(data, names=col_path, parents=col_parent, values='layout_value', color='score', color_continuous_midpoint=0.5, hover_data=['views', 'score'])
fig.update_traces(hovertemplate='''
<b>%{label}</b><br>Votes: %{customdata[0]}<br>Score: %{customdata[1]}
''')
fig

Zero remainder treemap with custom hovertemplate

Regardless of the existence of this workaround, compromising the "low-code" claim plotly asserts, a fix would still be great!

ProphetLamb avatar Aug 03 '23 06:08 ProphetLamb

I also have this issue. But it seems to be dataset dependent.

My real dataset has 3 levels and highly variables values from 0 to trillions. This doesn't render. However, if I remove the 3rd level, I manage to render the 2 first levels fine.

When I use a test dataset with also 3 levels but less variability, with the exact same code, it works just fine with the 3 levels.
So the issue seems to be data dependent. I haven't managed yet to isolate what property of the data makes it fail.
Is there any way to get some plotly error logs to understand where the render fails?

NicolasPA avatar Nov 30 '23 08:11 NicolasPA

Ok, so my issue is that the sums were not correctly adding up from the 3rd level in my real dataset.
You may want to check if it was the case for you too.

I wish Plotly produced an error for this kind of render breaking mistakes.

NicolasPA avatar Dec 01 '23 05:12 NicolasPA

Hi - we are tidying up stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. If you'd like to submit a PR, we'd be happy to prioritize a review, and if it's a request for tech support, please post in our community forum. Thank you - @gvwilson

gvwilson avatar Jul 11 '24 22:07 gvwilson