dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

Support Categorical Correlation

Open jinglinpeng opened this issue 4 years ago • 4 comments

Is your feature request related to a problem? Please describe. Currently the plot_correlation only works for numerical variable. This issue extends plot_correlation to support categorical variable.

Describe the solution you'd like

  1. plot_correlation(df): Add Cramer V correlation matrix for all categorical columns Time: 2021.01.20-2021.01.27
  2. plot_correlation(df, x = cat): Add Cramer V correlation for categorical columns. Time: 2021.01.27-2021.02.03
  3. Add doc and test Time: 2021.02.03-2021-02.10

Reference:

  1. https://towardsdatascience.com/powerful-eda-exploratory-data-analysis-in-just-two-lines-of-code-using-sweetviz-6c943d32f34
  2. https://medium.com/@outside2SDs/an-overview-of-correlation-measures-between-categorical-and-continuous-variables-4c7f85610365

Describe alternatives you've considered NA Additional context NA

jinglinpeng avatar Jan 20 '21 00:01 jinglinpeng

Hi @jinglinpeng I suggest that you add Phik correlation too,

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution.

Here is extensive documentation available here https://phik.readthedocs.io/en/latest/.

Abdelgha-4 avatar Aug 11 '21 01:08 Abdelgha-4

Thanks @Abdelgha-4 for the suggestion! Indeed we once considered the PhiK correlation at https://github.com/sfu-db/dataprep/pull/145. However, PhiK is generally very slow comparing to other correlations so we decide to defer the implementation until someone thinks this is really needed.

dovahcrow avatar Aug 11 '21 21:08 dovahcrow

I see! sorry then, I didn't notice it was already discussed.

Abdelgha-4 avatar Aug 12 '21 01:08 Abdelgha-4

I see! sorry then, I didn't notice it was already discussed.

No worries! If you think this is an important feature then we can certainly add it.

dovahcrow avatar Aug 12 '21 07:08 dovahcrow