explainerdashboard icon indicating copy to clipboard operation
explainerdashboard copied to clipboard

String categorical values from Lightgbm

Open Guidosalimbeni opened this issue 2 years ago • 7 comments

Hello, great tool and library! Wonder if you can point me in the right direction to solve an issue?

  • with LightGbm we can list the categorical column and it is ok if the values are strings values
  • once in ExplainerDashboard the code crashes as it cannot handle strings values (at leat my understanding)

what can be a solution to this case? thanks

Guidosalimbeni avatar Mar 27 '22 16:03 Guidosalimbeni

Hi @Guidosalimbeni,

So the if the model is able to handle categorical values then ExplainerDashboard should handle it as well. It does at least for CatBoost, so I assume it should work for lightgbm as well.

Do you have some runnable example code that shows the crash or wrong output?

oegedijk avatar Mar 31 '22 20:03 oegedijk

Hi @Guidosalimbeni,

would you able to provide any examples of where this broke?

oegedijk avatar May 04 '22 18:05 oegedijk

Hello, I am running into this issue. Here is the error message that I get: TypeError: '<' not supported between instances of 'float' and 'str'

And here is a reproducible example:

from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split

from explainerdashboard import ClassifierExplainer, ExplainerDashboard
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df[df.select_dtypes("O").columns] = df.select_dtypes("O").astype("category")
df = df[["Survived", "Age", "Sex", "Embarked"]]
y = df.pop("Survived")
X = df
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LGBMClassifier()
model.fit(X_train, y_train, feature_name='auto', categorical_feature='auto')
explainer = ClassifierExplainer(
                model, X_test, y_test,
                labels=['Not survived', 'Survived'])

db = ExplainerDashboard(explainer, title="Titanic Explainer",
                    whatif=False,
                    shap_interaction=False,
                    decision_trees=False)
db.run(port=8051)

Dekermanjian avatar Oct 15 '22 02:10 Dekermanjian

Hello, I am running into this issue. Here is the error message that I get: TypeError: '<' not supported between instances of 'float' and 'str'

And here is a reproducible example:

from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split

from explainerdashboard import ClassifierExplainer, ExplainerDashboard
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df[df.select_dtypes("O").columns] = df.select_dtypes("O").astype("category")
df = df[["Survived", "Age", "Sex", "Embarked"]]
y = df.pop("Survived")
X = df
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LGBMClassifier()
model.fit(X_train, y_train, feature_name='auto', categorical_feature='auto')
explainer = ClassifierExplainer(
                model, X_test, y_test,
                labels=['Not survived', 'Survived'])

db = ExplainerDashboard(explainer, title="Titanic Explainer",
                    whatif=False,
                    shap_interaction=False,
                    decision_trees=False)
db.run(port=8051)

I am having the same problem, is there a solution?

ghost avatar Oct 17 '22 17:10 ghost

Yes great, I am still having the same issue.

Guidosalimbeni avatar Oct 18 '22 17:10 Guidosalimbeni

Hi,

I have the same issue due to string values in the data, I'd like to create dashboard, as a fitted model used TabularPredictor from Autogluon library is there any solution or update related to this issue?

galievaz avatar Nov 29 '22 12:11 galievaz

I think this issue is more related to the data and how LightGBM is coded.

I stumble upon this error, but it was an error from LightGBM, not explainerdashboard.

Try the following:

df.columns = df.columns.str.translate("".maketrans({"[":"{", "]":"}","<":"^"}))
df.columns[df.columns.str.contains("[\[\]<]")]

This is making sure it removes and targets the error: TypeError: '<' not supported between instances of 'float' and 'str'

Do let me know if that solves your issue.

fjpa121197 avatar Jan 05 '23 17:01 fjpa121197