plotly.py
plotly.py copied to clipboard
Sunburst and Treemap graph does not render with `branchvalues="total"`
Good evening everyone,
I was writing up a small article, when I encountered an error with the branchvalues="total"
setting on px.sunburst
as well as px.treemap
.
The bug being, the plot renders whitespace only.
Changing the branchvalues
parameter back to remainder
renders the graph as expected. But since I am summing up the views
, the format is invalid.
Can anyone reproduce these results? Any ideas what I can do to circumvent this issue?
Thanks in advance! Cheers
Minimum working sample
Treemap branchvalues="total"
Treemap branchvalues="remainder"
Sunburst branchvalues="total"
Sunburst branchvalues="remainder"
Intermediate Dataset
path | score | views | parent |
---|---|---|---|
music/pop/jackson/billie_jean.mp3 | 0.8000 | 1000 | music/pop/jackson |
music/pop/jackson/beat_it.mp3 | 0.9000 | 2000 | music/pop/jackson |
music/pop/abba/dancing_queen.mp3 | 0.7000 | 1500 | music/pop/abba |
music/pop/abba/voulez-vous/voulez-vous.mp3 | 0.7500 | 1500 | music/pop/abba/voulez-vous |
music/pop/abba/voulez-vous/summer_night_city.mp3 | 0.8000 | 1500 | music/pop/abba/voulez-vous |
music/pop/abba/waterloo.mp3 | 0.8000 | 1500 | music/pop/abba |
music/pop/abba/chiquitita.mp3 | 0.7000 | 1500 | music/pop/abba |
music/pop/abba/s.o.s.mp3 | 0.7000 | 1500 | music/pop/abba |
music/rock/queen/bohemian_rhapsody.mp3 | 0.9000 | 3000 | music/rock/queen |
music/pop/abba | 0.7000 | 6000 | music/pop |
music/pop/abba/voulez-vous | 0.7750 | 3000 | music/pop/abba |
music/pop/jackson | 0.8500 | 3000 | music/pop |
music/rock/queen | 0.9000 | 3000 | music/rock |
music/pop | 0.7750 | 9000 | music |
music/rock | 0.9000 | 3000 | music |
music | 0.8375 | 12000 | None |
Source code
# %%
import typing as t
import pandas as pd
import plotly.express as px
# %%
data = data = pd.DataFrame([
{ 'path': 'music/pop/jackson/billie_jean.mp3', 'score': 0.8, 'views': 1000 },
{ 'path': 'music/pop/jackson/beat_it.mp3', 'score': 0.9, 'views': 2000 },
{ 'path': 'music/pop/abba/dancing_queen.mp3', 'score': 0.7, 'views': 1500 },
{ 'path': 'music/pop/abba/voulez-vous/voulez-vous.mp3', 'score': 0.75, 'views': 1500 },
{ 'path': 'music/pop/abba/voulez-vous/summer_night_city.mp3', 'score': 0.8, 'views': 1500 },
{ 'path': 'music/pop/abba/waterloo.mp3', 'score': 0.8, 'views': 1500 },
{ 'path': 'music/pop/abba/chiquitita.mp3', 'score': 0.7, 'views': 1500 },
{ 'path': 'music/pop/abba/s.o.s.mp3', 'score': 0.7, 'views': 1500 },
{ 'path': 'music/rock/queen/bohemian_rhapsody.mp3', 'score': 0.9, 'views': 3000 },
])
# %%
col_path: str = 'path'
col_parent: str = 'parent'
def path_parent_fn(path):
path = path.split('/')
path = '/'.join(path[:-1]) if len(path) > 0 else ''
path = path.strip()
return path if len(path) > 0 else None
aggregation = { 'score': 'median', 'views': 'sum' }
# %%
PathT = t.TypeVar('PathT')
Axis = t.Union[int, str]
def create_hierarchy_data(
data: pd.DataFrame,
col_path: Axis,
col_parent: Axis,
path_parent_fn: t.Callable[[PathT], t.Union[PathT, None]],
aggregation: t.Any
):
data[col_parent] = data[col_path].apply(path_parent_fn)
def parent_in_data_or_na():
return data[col_parent].isin(data[col_path]) | data[col_parent].isna()
while not parent_in_data_or_na().all(skipna=True):
missing_parents = data[data[col_parent].isin(data[col_path]) == False]
missing_parents = missing_parents.groupby(col_parent, as_index=False)
missing_parents_keys = missing_parents.groups.keys()
missing_parents = missing_parents.agg(aggregation)
missing_parents[col_path] = missing_parents_keys
missing_parents[col_parent] = missing_parents[col_path].apply(path_parent_fn)
data = pd.concat([
data,
missing_parents
], ignore_index=True)
data = data[data[col_path].isna() == False]
return data
# %%
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
data
# %%
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.treemap(data, names=col_path, parents=col_parent, values='views', color='score', color_continuous_midpoint=0.5, branchvalues='total')
fig
# %%
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.sunburst(data, names=col_path, parents=col_parent, values='views', color='score', color_continuous_midpoint=0.5, branchvalues='total')
fig
The only method to render this data, using branchvalues='remainder'
and zero values in all nodes but leaves.
Applying this fix yields an aggregation function, where every computed value is zero:
aggregation = { 'score': 'median', 'views': 'sum', 'layout_value': lambda x: 0 }
The data['layout_value']
column is initialized from the current value column data['views']
and then used instead of value when rendering the figure:
data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.treemap(data, names=col_path, parents=col_parent, values='layout_value', color='score', color_continuous_midpoint=0.5, branchvalues='remainder')
fig
This figure has a zero size for all generated nodes, but the branchvalues='remainder'
ensure, that the generated parent node inherits the size of the child nodes.
Finally, rendering a custom hover and using the real value, instead of the plotly
workaround value, yields a beautiful graph.
data = get_data()
data['layout_value'] = data['views']
data = create_hierarchy_data(data, col_path, col_parent, path_parent_fn, aggregation)
fig = px.treemap(data, names=col_path, parents=col_parent, values='layout_value', color='score', color_continuous_midpoint=0.5, hover_data=['views', 'score'])
fig.update_traces(hovertemplate='''
<b>%{label}</b><br>Votes: %{customdata[0]}<br>Score: %{customdata[1]}
''')
fig
Regardless of the existence of this workaround, compromising the "low-code" claim plotly
asserts, a fix would still be great!
I also have this issue. But it seems to be dataset dependent.
My real dataset has 3 levels and highly variables values from 0 to trillions. This doesn't render. However, if I remove the 3rd level, I manage to render the 2 first levels fine.
When I use a test dataset with also 3 levels but less variability, with the exact same code, it works just fine with the 3 levels.
So the issue seems to be data dependent.
I haven't managed yet to isolate what property of the data makes it fail.
Is there any way to get some plotly error logs to understand where the render fails?
Ok, so my issue is that the sums were not correctly adding up from the 3rd level in my real dataset.
You may want to check if it was the case for you too.
I wish Plotly produced an error for this kind of render breaking mistakes.
Hi - we are tidying up stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. If you'd like to submit a PR, we'd be happy to prioritize a review, and if it's a request for tech support, please post in our community forum. Thank you - @gvwilson