ydata-profiling icon indicating copy to clipboard operation
ydata-profiling copied to clipboard

Alert inconsistency for 'High correlation' in JSON vs HTML Report

Open Samuele-Campioli opened this issue 2 years ago • 2 comments

Describe the bug I don't know if it's feature or a bug, but i think it's the latter. When i use the normal ProfileReport it return the complete Alerts messages, but if i print the alerts in a json format the Alerts result only partials.

To Reproduce

import pandas as pd
import pandas_profiling

file_name = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df = pd.read_csv(file_name)
ProfileReport(df)

ProfileReport(df).to_file('./test_file.json')

Result from the first command:

Alerts

Dataset has 2 (1.3%) duplicate rows Duplicates
sepal_length is highly correlated with petal_length and 1 other fields High correlation
petal_length is highly correlated with sepal_length and 1 other fields High correlation
petal_width is highly correlated with sepal_length and 1 other fields High correlation
sepal_length is highly correlated with petal_length and 1 other fields High correlation
petal_length is highly correlated with sepal_length and 1 other fields High correlation
petal_width is highly correlated with sepal_length and 1 other fields High correlation
sepal_length is highly correlated with petal_length and 1 other fields High correlation
petal_length is highly correlated with sepal_length and 1 other fields High correlation
petal_width is highly correlated with sepal_length and 1 other fields High correlation
sepal_length is highly correlated with sepal_width and 3 other fields High correlation
sepal_width is highly correlated with sepal_length and 3 other fields High correlation
petal_length is highly correlated with sepal_length and 3 other fields High correlation
petal_width is highly correlated with sepal_length and 3 other fields High correlation
species is highly correlated with sepal_length and 3 other fields High correlation
species is uniformly distributed Uniform

This is the result from the last command:

['[DUPLICATES] alert on column None', '[HIGH_CORRELATION] alert on column sepal_length', '[HIGH_CORRELATION] alert on column petal_length', '[HIGH_CORRELATION] alert on column petal_width', '[HIGH_CORRELATION] alert on column sepal_length', '[HIGH_CORRELATION] alert on column petal_length', '[HIGH_CORRELATION] alert on column petal_width', '[HIGH_CORRELATION] alert on column sepal_length', '[HIGH_CORRELATION] alert on column petal_length', '[HIGH_CORRELATION] alert on column petal_width', '[HIGH_CORRELATION] alert on column sepal_length', '[HIGH_CORRELATION] alert on column sepal_width', '[HIGH_CORRELATION] alert on column petal_length', '[HIGH_CORRELATION] alert on column petal_width', '[HIGH_CORRELATION] alert on column species', '[UNIFORM] alert on column species']

Version information:

  • Python version: 3.8.10
  • Environment: Jupyter Notebook (local)
  • `pandas: 1.3.1
  • `pandas-profiling: 3.1.0

Samuele-Campioli avatar Mar 24 '22 13:03 Samuele-Campioli

Tracked this down to this line: https://github.com/ydataai/pandas-profiling/blob/develop/src/pandas_profiling/model/alerts.py#L103

sbrugman avatar May 09 '22 22:05 sbrugman

The alert messages are currently defined in the file in the previous comment for JSON, while for the HTML report they are defined in the alert templates. For consistency, the messages should be defined on the Alert classes in the model. Contributions welcome!

sbrugman avatar May 09 '22 23:05 sbrugman