treeshap icon indicating copy to clipboard operation
treeshap copied to clipboard

missing decision types

Open pecto2020 opened this issue 1 year ago • 1 comments

I was trying to create a unified lightgbm. I've fit the model using the tidymodels framework. Unfortunately I got this error: Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type. My understing is that there is a problem in decision_type. Checkig the model I've noticed that there are thousands of missing value in the decision type column...Any idea of why decisions are missing and how to solve the issue?

pecto2020 avatar Aug 24 '23 14:08 pecto2020

Missing values are expected in this column as they occur for every leaf node, so it is unlikely that this is the cause.

However, I wasn't able to reproduce this error using tidymodels framework. But please note that an object of class lgb.Booster must be provided to the lightgbm.unify function (this can be extracted with the extract_fit_engine() function, see here). If this is not the solution, please provide a reproducible example for such an error.

krzyzinskim avatar Sep 27 '23 13:09 krzyzinskim

I get this error too and have been able to reproduce it with a toy example.

If the step_dummy() line is uncommented, then it works.

lightgbm does though support categorical data without the need to dummy these variables. This introduces the decision type == where a categorical variable equals a specific value. This may be seen in the object lgb_trees which has a column showing the decision_type used after fitting the model, e.g. for the variable neighbourhood.

library(bonsai)
library(treeshap)
library(tidymodels)
library(shapviz)
library(jsonlite)

set.seed(123)
split <- initial_split(ames, prop = 0.8)
train <- training(split)
test <- testing(split)

recipe <- recipe(train) |> 
  update_role(Sale_Price, new_role = "outcome") |> 
  update_role(-has_role("outcome"), new_role = "predictor") |> 
  # step_dummy(all_nominal_predictors()) |> 
  step_zv(all_predictors()) 

spec <- 
  boost_tree(trees = 100, tree_depth = 6) |> 
  set_engine("lightgbm") |> 
  set_mode("regression")

fit <- workflow() |> 
  add_recipe(recipe) |> 
  add_model(spec) |> 
  fit(data = train)

lgb_trees <- lightgbm::lgb.model.dt.tree(extract_fit_engine(fit))

data <- recipe |>
  prep() |> 
  bake(train |> slice_sample(n = 100), has_role("predictor"))

x <- recipe |>
  prep() |>
  bake(test, has_role("predictor"))

shap <- extract_fit_engine(fit) |> 
  unify(data, type = "numeric") 
#> Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type

Created on 2024-10-01 with reprex v2.1.1

cgoo4 avatar Oct 01 '24 16:10 cgoo4