funpymodeling icon indicating copy to clipboard operation
funpymodeling copied to clipboard

Create `freq_plot` function (frequency plot for categorical variables)

Open pablo14 opened this issue 4 years ago • 1 comments

In funModeling, freq functions plots the frequency for all the categorical variables.

Below there is a code that do something similar:

import seaborn as sns
import matplotlib.pyplot as plt

sns.set()
tips = sns.load_dataset("tips")



d_plot=tips
fig, ax = plt.subplots(4, 2, figsize=(20, 20))
for variable, subplot in zip(cat_vars(d_plot), ax.flatten()):
    sns.countplot(y=d_plot[variable], ax=subplot, order = d_plot[variable].value_counts().index)
    for label in subplot.get_xticklabels():
        label.set_rotation(90)
        

It shows:

image

  1. This is not the case, but if the names are too long they overlap across the plots
  2. Don't create empty grids (calculate dynamically the number of plots)
  3. It needs to show the absolute and relative percetage per bar as it is shown below:

image

This data is already calculated by the function freq_tbl in this package.

  1. If there are more than 100 different categories, the plot should group in the other or more category, to avoid crashing.

  2. It should use the todf() function (from funpymodeling) to convert different datatypes to dataframe so freq_plot supports numpy 1D/2D, pandas series and 1D/2D lists

pablo14 avatar Aug 28 '20 16:08 pablo14

Can we use plotly? :P

aoelvp94 avatar Aug 29 '20 05:08 aoelvp94