ydata-profiling icon indicating copy to clipboard operation
ydata-profiling copied to clipboard

Bins equals 0 throughs error on histograms

Open ciberger opened this issue 3 years ago • 1 comments

Describe the bug Setting plot.histogram.bins to zero causes an error. Suspect the error might be on this part of the code. There might be a missing if condition.

https://github.com/pandas-profiling/pandas-profiling/blob/939969061459a7df5e0ed425cd471827924c4eed/src/pandas_profiling/model/summary_algorithms.py#L29-L45

To Reproduce

Data:

See pandas DataFrame
df = pd.DataFrame({'col1': {0: 1, 1: 1, 2: 6, 3: 2, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1},
 'col2': {0: 49,
  1: 92,
  2: 22,
  3: 115,
  4: 112,
  5: 84,
  6: 14,
  7: 53,
  8: 14,
  9: 13},
 'col3': {0: 3.0,
  1: 1.0,
  2: 3.0,
  3: 7.0,
  4: 4.0,
  5: 29.0,
  6: 4.0,
  7: 4.0,
  8: 3.0,
  9: 10.0},
 'col4': {0: False,
  1: False,
  2: False,
  3: True,
  4: False,
  5: False,
  6: False,
  7: False,
  8: False,
  9: False},
 'col5': {0: False,
  1: False,
  2: False,
  3: False,
  4: False,
  5: False,
  6: False,
  7: False,
  8: False,
  9: False},
 'col6': {0: False,
  1: False,
  2: False,
  3: True,
  4: False,
  5: False,
  6: False,
  7: False,
  8: False,
  9: False},
 'col7': {0: False,
  1: True,
  2: False,
  3: True,
  4: True,
  5: False,
  6: True,
  7: True,
  8: True,
  9: False},
 'col8': {0: True,
  1: False,
  2: True,
  3: False,
  4: False,
  5: True,
  6: False,
  7: False,
  8: False,
  9: True},
 'col9': {0: 3,
  1: 7,
  2: 26,
  3: 14,
  4: 11,
  5: 1,
  6: 4,
  7: 2,
  8: 1,
  9: 3},
 'col10': {0: 2.25,
  1: 3.9800000190734863,
  2: 0.0,
  3: 4.289999961853027,
  4: 3.569999933242798,
  5: 3.1600000858306885,
  6: 4.329999923706055,
  7: 3.5,
  8: 3.680000066757202,
  9: 0.0},
 'col11': {0: 116,
  1: 11688,
  2: 0,
  3: 5941,
  4: 11787,
  5: 8716,
  6: 718,
  7: 653,
  8: 8544,
  9: 0},
 'col12': {0: 0.13655671651983892,
  1: 0.08928571428571419,
  2: 0.3009205983889529,
  3: 0.11885245901639352,
  4: 0.43939393939393945,
  5: 0.0,
  6: 0.28411633109619694,
  7: 0.05050505050505061,
  8: 0.0,
  9: 0.025316455696202445},
 'col13': {0: 4,
  1: 15,
  2: 29,
  3: 17,
  4: 11,
  5: 19,
  6: 7,
  7: 11,
  8: 9,
  9: 12}})
 #   Column                                   Non-Null Count   Dtype  
---  ------                                   --------------   -----  
 0   col1                                       110607 non-null  int32  
 1   col2                                       110607 non-null  int32  
 2   col3                                      110600 non-null  float64
 3   col4                                      110607 non-null  bool   
 4   col5                                      110607 non-null  bool   
 5   col6                                      110607 non-null  bool   
 6   col7                                      110607 non-null  bool   
 7   col8                                      110607 non-null  bool   
 8   col9                                      110607 non-null  int32  
 9   col10                                     110607 non-null  float32
 10  col11                                    110607 non-null  int32  
 11  col12                                    110607 non-null  float64
 12  col13                                   110607 non-null  int64  

Code: Preferably, use this code format:

"""
Test for issue 884:
https://github.com/pandas-profiling/pandas-profiling/issues/884
"""
import pandas as pd
import pandas_profiling

profile = ProfileReport(df, plot={'histogram':{'bins': 0}})

TypeError: Automated estimation of the number of bins is not supported for weighted data

Version information:

  • Python version: 3.8.10
  • Environment: Jupyter Notebook
  • pip:
See env libraries ``` Antergos Linux | 2015.10 (ISO-Rolling) | appdirs | 1.4.4 | backcall | 0.2.0 boto3 | 1.16.7 | botocore | 1.19.7 | certifi | 2020.12.5 chardet | 4.0.0 | cycler | 0.10.0 | Cython | 0.29.23 dbus-python | 1.2.16 | decorator | 5.0.6 | distlib | 0.3.2 distro-info | 0.23ubuntu1 | facets-overview | 1.0.0 | filelock | 3.0.12 idna | 2.10 | ipykernel | 5.3.4 | ipython | 7.22.0 ipython-genutils | 0.2.0 | jedi | 0.17.2 | jmespath | 0.10.0 joblib | 1.0.1 | jupyter-client | 6.1.12 | jupyter-core | 4.7.1 kiwisolver | 1.3.1 | koalas | 1.8.1 | matplotlib | 3.4.2 numpy | 1.19.2 | pandas | 1.2.4 | parso | 0.7.0 patsy | 0.5.1 | pexpect | 4.8.0 | pickleshare | 0.7.5 Pillow | 8.2.0 | pip | 21.0.1 | plotly | 4.14.3 prompt-toolkit | 3.0.17 | protobuf | 3.17.2 | psycopg2 | 2.8.5 ptyprocess | 0.7.0 | pyarrow | 4.0.0 | Pygments | 2.8.1 PyGObject | 3.36.0 | pyparsing | 2.4.7 | python-apt | 2.0.0+ubuntu0.20.4.5 python-dateutil | 2.8.1 | pytz | 2020.5 | pyzmq | 20.0.0 requests | 2.25.1 | requests-unixsocket | 0.2.0 | retrying | 1.3.3 s3transfer | 0.3.7 | scikit-learn | 0.24.1 | scipy | 1.6.2 seaborn | 0.11.1 | setuptools | 52.0.0 | six | 1.15.0 ssh-import-id | 5.10 | statsmodels | 0.12.2 | threadpoolctl | 2.1.0 tornado | 6.1 | traitlets | 5.0.5 | unattended-upgrades | 0.1 urllib3 | 1.25.11 | virtualenv | 20.4.1 | wcwidth | 0.2.5 wheel | 0.36.2 |   |   |   ```

Additional context

ciberger avatar Nov 15 '21 15:11 ciberger

Good catch! Automated bin detection is (apparently) not supported for weighted data - which we use to greatly improve the performance of the profiling. This option should be disabled (or implemented).

sbrugman avatar May 02 '22 01:05 sbrugman