seaborn icon indicating copy to clipboard operation
seaborn copied to clipboard

Altering common_norm to allow additional normalizations

Open chrisroat opened this issue 3 years ago • 5 comments

In making a displot with stat="percent", there are two normalizations, controlled by the boolean common_norm. Either the entire figure is normalized, or each individual group is normalized. I would find it useful to allow per-facet normalization, as well.

I'd like to propose that common_norm be allowed to take on string values like "figure" (same as True today), "group" (same as False today), "facet", and "hue".

chrisroat avatar Feb 12 '22 05:02 chrisroat

Something like this is on the roadmap, but I don't think this is the right API. Better to accept a list of grouping variables (e.g. common_norm=["col", "hue"]) which

  • would not require the relevant statistical code to know what a "facet" is
  • wouldn't be ambiguous in other cases where multiple distributions appear in one "figure" (e.g. jointplot)
  • would allow normalization across rows/cols when both are present
  • would be forward compatible with distribution plots gaining additional semantic parameters (size/style)

mwaskom avatar Feb 26 '22 18:02 mwaskom

Thanks for the follow-up. I like the more general approach. If I understand what you propose, then for a "facet" normalization one would use ["row", "col"]?

If it exists, can you provide a link to the roadmap piece that would cover this.

chrisroat avatar Feb 27 '22 04:02 chrisroat

Thinking on the relevant issue of Plotting conditional distribution with different hues, and looking at the codes, I think it would be impossible using sns.FacetGrid() because sns.FacetGrid() splits the data according to col, row, and hue so there is no way to find out conditional probability conditioning on col, row, etc.

The easiest the work-around would be calculate the estimated conditional probability for oneself, and use something more direct way of plotting bars, here's the example,

BankWages['gender'] = BankWages['gender'].astype('category')

# you should use .groupby() according to the conditional probability you want to visualize
df_plot = BankWages.groupby(['minority'])[['gender', 'job']].value_counts(normalize=True).reset_index()

def plt_bar(x, y, hue, **kwargs):
    if 'color' in kwargs:
        del kwargs['color'] 
    ax = plt.gca()
    #print(kwargs['color'])
    for icat, cat in enumerate(hue.cat.categories):
        #print(cat)
        color = sns.color_palette()[icat]
        ax.bar(x=x[hue==cat], height=y[hue==cat], color=color, **kwargs)
    return ax

g = sns.FacetGrid(df_plot, col='minority')
g.map(plt_bar, 'job', 'proportion', 'gender', width=0.8, alpha=0.5)

for plots like setting multiple='dodge', it seems to require more code something like this grouped bar chart

But then again if we can bring the parameter common_norm= to sns.FacetGrid(), and make some exceptions on how to split the data, we might be able to plot bar plot of conditional probability conditioning on the variables listed in common_norm=. After all, displot() seems to use sns.FacetGrid()

I wonder what kind of plots needs more than a row, col, hue-level conditional estimates?

kwhkim avatar Nov 09 '23 03:11 kwhkim

To be complete, the objects interface accepts the list-based common_norm values mentioned earlier, e.g. common_norm=["row","col"] will group only on the row and columns, disregarding the color.

@kwhkim the issue with your suggestion is that common_norm is a parameter of histplot specifically, which complicates things. Also, with your new example, why are you not using a catplot with kind="bar" then ?

thuiop avatar Nov 09 '23 09:11 thuiop

@thuiop catplot seems to be a great suggestion! And it works fine. I thought it only works for visualizing means or summary statistics alike.

g = sns.catplot(df_plot,
                kind='bar', col='minority', hue='gender',
                hue_order = ['male', 'female'],
                x='job', y='proportion',
                height = 2, aspect = 7/2/2)#, errorbar=None)

I think"the final piece of the puzzle" would be visualizing heat map or 2d-histogram with different conditional probabilities, something like,

sns.displot(data=BankWages, row='minority',
            x='job', y='gender', cbar=True, # cbar : colorbar
            height=3, aspect=1*3/2, 
            stat='probability', common_norm=False) 

This one looks impossible to solve without sns.FacetGrid() and sns.heatmap()... Can seaborn objects solve this?

kwhkim avatar Nov 09 '23 09:11 kwhkim