explainerdashboard icon indicating copy to clipboard operation
explainerdashboard copied to clipboard

Error when X has boolean columns

Open sebastian-correa opened this issue 3 years ago • 2 comments

Summary

In my project, we have a dataset with 1 bool column (with the rest being normal numeric columns).

I'm trying to make an Hub with many dashboards, but when instantiating the first dashboard I get

numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'

The lines that error out are the following, because we're trying to round a bool. https://github.com/oegedijk/explainerdashboard/blob/2a21322ad0f414cbaaa391ef8ce37fb2b96cf80c/explainerdashboard/dashboard_components/overview_components.py#L736-L737

For now, I can circumvent the issue by doing

bool_cols = X.select_dtypes(include="bool").columns
X = X.astype({col: "uint8" for col in bool_cols})

MWE

Copy this text in test.py and do python test.py.

import numpy as np
import pandas as pd
from explainerdashboard import ExplainerDashboard, ExplainerHub, RegressionExplainer
from xgboost import XGBRegressor

n_rows = 1_000
n_cols = 20

data = {f"col_{i}": np.random.random(n_rows) for i in range(n_cols - 1)}
data["bool_col"] = np.random.randint(0, 2, n_rows, dtype=bool)
X = pd.DataFrame(data)
y = np.random.random(n_rows)

reg = XGBRegressor()
reg.fit(X, y)

explainer = RegressionExplainer(reg, X, y, target="Target", units="u")

dashboards = [
    ExplainerDashboard(
        explainer,
        title="MWE",
        name="MWE",
        description="MWE Dashboard",
        shap_interaction=True,
        shap_dependence=True,
    )
]

hub = ExplainerHub(dashboards, title="MWE", description="MWE")
hub.run(port=8050)

Full traceback

pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
Changing class type to XGBRegressionExplainer...
Generating self.shap_explainer = shap.TreeExplainer(model)
Building ExplainerDashboard..
Warning: calculating shap interaction values can be slow! Pass shap_interaction=False to remove interactions tab.
Generating layout...
Calculating shap values...
ntree_limit is deprecated, use `iteration_range` or model slicing instead.
Calculating predictions...
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
Calculating residuals...
Calculating absolute residuals...
Traceback (most recent call last):
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/me/Dev/app/test.py", line 20, in <module>
    ExplainerDashboard(
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 590, in __init__
    self.explainer_layout = ExplainerTabsLayout(explainer, tabs, title, 
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 104, in __init__
    self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 104, in <listcomp>
    self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_methods.py", line 733, in instantiate_component
    component = component(explainer, name=name, **kwargs)
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/composites.py", line 421, in __init__
    self.input = FeatureInputComponent(explainer, name=self.name+"0",
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 695, in __init__
    self._feature_inputs = [
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 696, in <listcomp>
    self._generate_dash_input(
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 736, in _generate_dash_input
    min_range = np.round(self.explainer.X[col][lambda x: x != self.explainer.na_fill].min(), 2)
  File "<__array_function__ internals>", line 180, in round_
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 3773, in round_
    return around(a, decimals=decimals, out=out)
  File "<__array_function__ internals>", line 180, in around
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 3348, in around
    return _wrapfunc(a, 'round', decimals=decimals, out=out)
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 66, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'

Environment

  • Python 3.9.10
  • explainerdashboard 0.4.0
  • numpy 1.22.3
  • pandas 1.4.1
  • xgboost 1.5.2
  • MacOS 12.4 on Intel Mac.

sebastian-correa avatar Jun 29 '22 01:06 sebastian-correa

Couldn't you change the bool columns to a ['0', '1'] int or float column?

oegedijk avatar Jan 01 '23 19:01 oegedijk

Couldn't you change the bool columns to a ['0', '1'] int or float column?

That's the workaround I cited! It's still a workaround and I think Explainer Dashboard should work with bool columns. Why not avoid the np.round operation when the column is bool?

sebastian-correa avatar Jan 09 '23 15:01 sebastian-correa