explainerdashboard
explainerdashboard copied to clipboard
Error when X has boolean columns
Summary
In my project, we have a dataset with 1 bool column (with the rest being normal numeric columns).
I'm trying to make an Hub with many dashboards, but when instantiating the first dashboard I get
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'
The lines that error out are the following, because we're trying to round a bool.
https://github.com/oegedijk/explainerdashboard/blob/2a21322ad0f414cbaaa391ef8ce37fb2b96cf80c/explainerdashboard/dashboard_components/overview_components.py#L736-L737
For now, I can circumvent the issue by doing
bool_cols = X.select_dtypes(include="bool").columns
X = X.astype({col: "uint8" for col in bool_cols})
MWE
Copy this text in test.py and do python test.py.
import numpy as np
import pandas as pd
from explainerdashboard import ExplainerDashboard, ExplainerHub, RegressionExplainer
from xgboost import XGBRegressor
n_rows = 1_000
n_cols = 20
data = {f"col_{i}": np.random.random(n_rows) for i in range(n_cols - 1)}
data["bool_col"] = np.random.randint(0, 2, n_rows, dtype=bool)
X = pd.DataFrame(data)
y = np.random.random(n_rows)
reg = XGBRegressor()
reg.fit(X, y)
explainer = RegressionExplainer(reg, X, y, target="Target", units="u")
dashboards = [
ExplainerDashboard(
explainer,
title="MWE",
name="MWE",
description="MWE Dashboard",
shap_interaction=True,
shap_dependence=True,
)
]
hub = ExplainerHub(dashboards, title="MWE", description="MWE")
hub.run(port=8050)
Full traceback
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
Changing class type to XGBRegressionExplainer...
Generating self.shap_explainer = shap.TreeExplainer(model)
Building ExplainerDashboard..
Warning: calculating shap interaction values can be slow! Pass shap_interaction=False to remove interactions tab.
Generating layout...
Calculating shap values...
ntree_limit is deprecated, use `iteration_range` or model slicing instead.
Calculating predictions...
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
Calculating residuals...
Calculating absolute residuals...
Traceback (most recent call last):
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/me/Dev/app/test.py", line 20, in <module>
ExplainerDashboard(
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 590, in __init__
self.explainer_layout = ExplainerTabsLayout(explainer, tabs, title,
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 104, in __init__
self.tabs = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 104, in <listcomp>
self.tabs = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_methods.py", line 733, in instantiate_component
component = component(explainer, name=name, **kwargs)
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/composites.py", line 421, in __init__
self.input = FeatureInputComponent(explainer, name=self.name+"0",
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 695, in __init__
self._feature_inputs = [
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 696, in <listcomp>
self._generate_dash_input(
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 736, in _generate_dash_input
min_range = np.round(self.explainer.X[col][lambda x: x != self.explainer.na_fill].min(), 2)
File "<__array_function__ internals>", line 180, in round_
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 3773, in round_
return around(a, decimals=decimals, out=out)
File "<__array_function__ internals>", line 180, in around
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 3348, in around
return _wrapfunc(a, 'round', decimals=decimals, out=out)
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 66, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'
Environment
- Python 3.9.10
- explainerdashboard 0.4.0
- numpy 1.22.3
- pandas 1.4.1
- xgboost 1.5.2
- MacOS 12.4 on Intel Mac.
Couldn't you change the bool columns to a ['0', '1'] int or float column?
Couldn't you change the bool columns to a
['0', '1']int or float column?
That's the workaround I cited! It's still a workaround and I think Explainer Dashboard should work with bool columns. Why not avoid the np.round operation when the column is bool?