ydata-profiling
ydata-profiling copied to clipboard
Alert inconsistency for 'High correlation' in JSON vs HTML Report
Describe the bug I don't know if it's feature or a bug, but i think it's the latter. When i use the normal ProfileReport it return the complete Alerts messages, but if i print the alerts in a json format the Alerts result only partials.
To Reproduce
import pandas as pd
import pandas_profiling
file_name = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df = pd.read_csv(file_name)
ProfileReport(df)
ProfileReport(df).to_file('./test_file.json')
Result from the first command:
Alerts
Dataset has 2 (1.3%) duplicate rows | Duplicates |
---|---|
sepal_length is highly correlated with petal_length and 1 other fields | High correlation |
petal_length is highly correlated with sepal_length and 1 other fields | High correlation |
petal_width is highly correlated with sepal_length and 1 other fields | High correlation |
sepal_length is highly correlated with petal_length and 1 other fields | High correlation |
petal_length is highly correlated with sepal_length and 1 other fields | High correlation |
petal_width is highly correlated with sepal_length and 1 other fields | High correlation |
sepal_length is highly correlated with petal_length and 1 other fields | High correlation |
petal_length is highly correlated with sepal_length and 1 other fields | High correlation |
petal_width is highly correlated with sepal_length and 1 other fields | High correlation |
sepal_length is highly correlated with sepal_width and 3 other fields | High correlation |
sepal_width is highly correlated with sepal_length and 3 other fields | High correlation |
petal_length is highly correlated with sepal_length and 3 other fields | High correlation |
petal_width is highly correlated with sepal_length and 3 other fields | High correlation |
species is highly correlated with sepal_length and 3 other fields | High correlation |
species is uniformly distributed | Uniform |
This is the result from the last command:
['[DUPLICATES] alert on column None', '[HIGH_CORRELATION] alert on column sepal_length', '[HIGH_CORRELATION] alert on column petal_length', '[HIGH_CORRELATION] alert on column petal_width', '[HIGH_CORRELATION] alert on column sepal_length', '[HIGH_CORRELATION] alert on column petal_length', '[HIGH_CORRELATION] alert on column petal_width', '[HIGH_CORRELATION] alert on column sepal_length', '[HIGH_CORRELATION] alert on column petal_length', '[HIGH_CORRELATION] alert on column petal_width', '[HIGH_CORRELATION] alert on column sepal_length', '[HIGH_CORRELATION] alert on column sepal_width', '[HIGH_CORRELATION] alert on column petal_length', '[HIGH_CORRELATION] alert on column petal_width', '[HIGH_CORRELATION] alert on column species', '[UNIFORM] alert on column species']
Version information:
- Python version: 3.8.10
- Environment: Jupyter Notebook (local)
- `pandas: 1.3.1
- `pandas-profiling: 3.1.0
Tracked this down to this line: https://github.com/ydataai/pandas-profiling/blob/develop/src/pandas_profiling/model/alerts.py#L103
The alert messages are currently defined in the file in the previous comment for JSON, while for the HTML report they are defined in the alert templates. For consistency, the messages should be defined on the Alert classes in the model. Contributions welcome!