ydata-profiling
ydata-profiling copied to clipboard
Bins equals 0 throughs error on histograms
Describe the bug Setting plot.histogram.bins to zero causes an error. Suspect the error might be on this part of the code. There might be a missing if condition.
https://github.com/pandas-profiling/pandas-profiling/blob/939969061459a7df5e0ed425cd471827924c4eed/src/pandas_profiling/model/summary_algorithms.py#L29-L45
To Reproduce
Data:
See pandas DataFrame
df = pd.DataFrame({'col1': {0: 1, 1: 1, 2: 6, 3: 2, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1},
'col2': {0: 49,
1: 92,
2: 22,
3: 115,
4: 112,
5: 84,
6: 14,
7: 53,
8: 14,
9: 13},
'col3': {0: 3.0,
1: 1.0,
2: 3.0,
3: 7.0,
4: 4.0,
5: 29.0,
6: 4.0,
7: 4.0,
8: 3.0,
9: 10.0},
'col4': {0: False,
1: False,
2: False,
3: True,
4: False,
5: False,
6: False,
7: False,
8: False,
9: False},
'col5': {0: False,
1: False,
2: False,
3: False,
4: False,
5: False,
6: False,
7: False,
8: False,
9: False},
'col6': {0: False,
1: False,
2: False,
3: True,
4: False,
5: False,
6: False,
7: False,
8: False,
9: False},
'col7': {0: False,
1: True,
2: False,
3: True,
4: True,
5: False,
6: True,
7: True,
8: True,
9: False},
'col8': {0: True,
1: False,
2: True,
3: False,
4: False,
5: True,
6: False,
7: False,
8: False,
9: True},
'col9': {0: 3,
1: 7,
2: 26,
3: 14,
4: 11,
5: 1,
6: 4,
7: 2,
8: 1,
9: 3},
'col10': {0: 2.25,
1: 3.9800000190734863,
2: 0.0,
3: 4.289999961853027,
4: 3.569999933242798,
5: 3.1600000858306885,
6: 4.329999923706055,
7: 3.5,
8: 3.680000066757202,
9: 0.0},
'col11': {0: 116,
1: 11688,
2: 0,
3: 5941,
4: 11787,
5: 8716,
6: 718,
7: 653,
8: 8544,
9: 0},
'col12': {0: 0.13655671651983892,
1: 0.08928571428571419,
2: 0.3009205983889529,
3: 0.11885245901639352,
4: 0.43939393939393945,
5: 0.0,
6: 0.28411633109619694,
7: 0.05050505050505061,
8: 0.0,
9: 0.025316455696202445},
'col13': {0: 4,
1: 15,
2: 29,
3: 17,
4: 11,
5: 19,
6: 7,
7: 11,
8: 9,
9: 12}})
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 110607 non-null int32
1 col2 110607 non-null int32
2 col3 110600 non-null float64
3 col4 110607 non-null bool
4 col5 110607 non-null bool
5 col6 110607 non-null bool
6 col7 110607 non-null bool
7 col8 110607 non-null bool
8 col9 110607 non-null int32
9 col10 110607 non-null float32
10 col11 110607 non-null int32
11 col12 110607 non-null float64
12 col13 110607 non-null int64
Code: Preferably, use this code format:
"""
Test for issue 884:
https://github.com/pandas-profiling/pandas-profiling/issues/884
"""
import pandas as pd
import pandas_profiling
profile = ProfileReport(df, plot={'histogram':{'bins': 0}})
TypeError: Automated estimation of the number of bins is not supported for weighted data
Version information:
- Python version: 3.8.10
- Environment: Jupyter Notebook
-
pip
:
See env libraries
``` Antergos Linux | 2015.10 (ISO-Rolling) | appdirs | 1.4.4 | backcall | 0.2.0 boto3 | 1.16.7 | botocore | 1.19.7 | certifi | 2020.12.5 chardet | 4.0.0 | cycler | 0.10.0 | Cython | 0.29.23 dbus-python | 1.2.16 | decorator | 5.0.6 | distlib | 0.3.2 distro-info | 0.23ubuntu1 | facets-overview | 1.0.0 | filelock | 3.0.12 idna | 2.10 | ipykernel | 5.3.4 | ipython | 7.22.0 ipython-genutils | 0.2.0 | jedi | 0.17.2 | jmespath | 0.10.0 joblib | 1.0.1 | jupyter-client | 6.1.12 | jupyter-core | 4.7.1 kiwisolver | 1.3.1 | koalas | 1.8.1 | matplotlib | 3.4.2 numpy | 1.19.2 | pandas | 1.2.4 | parso | 0.7.0 patsy | 0.5.1 | pexpect | 4.8.0 | pickleshare | 0.7.5 Pillow | 8.2.0 | pip | 21.0.1 | plotly | 4.14.3 prompt-toolkit | 3.0.17 | protobuf | 3.17.2 | psycopg2 | 2.8.5 ptyprocess | 0.7.0 | pyarrow | 4.0.0 | Pygments | 2.8.1 PyGObject | 3.36.0 | pyparsing | 2.4.7 | python-apt | 2.0.0+ubuntu0.20.4.5 python-dateutil | 2.8.1 | pytz | 2020.5 | pyzmq | 20.0.0 requests | 2.25.1 | requests-unixsocket | 0.2.0 | retrying | 1.3.3 s3transfer | 0.3.7 | scikit-learn | 0.24.1 | scipy | 1.6.2 seaborn | 0.11.1 | setuptools | 52.0.0 | six | 1.15.0 ssh-import-id | 5.10 | statsmodels | 0.12.2 | threadpoolctl | 2.1.0 tornado | 6.1 | traitlets | 5.0.5 | unattended-upgrades | 0.1 urllib3 | 1.25.11 | virtualenv | 20.4.1 | wcwidth | 0.2.5 wheel | 0.36.2 | | | ```Additional context
Good catch! Automated bin detection is (apparently) not supported for weighted data - which we use to greatly improve the performance of the profiling. This option should be disabled (or implemented).