LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[Question] Setting values for linear coefficient

Open velezbeltran opened this issue 1 year ago • 5 comments

Summary

Hello! Thank you for the library; it has been invaluable to my work for the past couple of years!

I was wondering if, from the Python interface, it is possible to manually set the linear model at the leaf if we are fitting a linear tree. That is if when training the model, we use linear_tree=True is it possible to afterwards modify the linear model at each leaf. If not, I think it would be useful.

Motivation

This is good if you want to compute the derivatives of the tree and use the linear model as an approximation. That is what we were planning on using it for. In that case we can differentiate by modifying the linear model and setting some values to 0.

Description

Essentially, having some function that is similar to set_leaf_output but for the coefficients.

image

velezbeltran avatar Mar 27 '24 15:03 velezbeltran

Thanks for using LightGBM.

I've edited your post to actually use set_leaf_output in plaintext, so this could be found from search engines.

I think this is an interesting idea. Could you write some pseudo-code showing what you'd like the interface to look like? For example, would it be like this?

Booster.set_linear_leaf_coefficients(
   tree_id=1234,
   leaf_id=5,
   constant=100.5,
   beta=0.89
)

(I don't recall if LightGBM linear models have a constant, would have to double-check)

jameslamb avatar Mar 27 '24 16:03 jameslamb

Thank you for the prompt reply @jameslamb! I collaborate with @velezbeltran.

When linear_tree=True, each leaf has:

  • leaf_const: intercept of the linear model.
  • leaf_features: indices of the numerical features in the leaf's branch.
  • leaf_coeff: slopes of the linear model, one for each feature.

So the interface may look like this:

Booster.set_leaf_linear_model(
   tree_id=1234,
   leaf_id=5,
   constant=100.5,
   features=[0, 3, 4],
   coefficients=[0.89, 0.12, 3.14]
)

To modify the coefficients within a leaf, we need to know which features appear in the leaf's linear model. So the set method would be paired with a get method (similar to get_leaf_output and set_leaf_output):

Booster.get_leaf_linear_model(
   tree_id=1234,
   leaf_id=5
)
  """
  Return intercept, features, and slopes of the linear model.
  """

My understanding is that at the moment the only method to access the linear coefficients is via Booster.dump_model().

aagrande avatar Mar 27 '24 18:03 aagrande

Thanks for that, makes sense to me!

We'd have to figure out specifics on how much validation to do, how to test this, etc. but in general I think this would be a great addition to the library, to add functionality for linear models that's similar to what you can get for regular single-value leaf nodes with `set_leaf_output().

I think we'd want to add this at the level of the C API and keep the logic on the Python side as minimal as possible.

@guolinke @shiyu1994 @jmoralez @borchero @btrotta what do you think about this? I think I should not be the one to decide along whether or not we accept an expansion of the library's API like this.

jameslamb avatar Apr 01 '24 03:04 jameslamb

I personally cannot gauge the usefulness of this feature and believe that this is quite a niche requirement. That being said, I also don't see a reason to not expose the coefficients of the linear models via the Python API and, similarly, allow to modify these values.

Regarding testing, I don't have a lot of concerns: it seems to me like this would essentially be about implementing "getter/setter" methods for the coefficients.

borchero avatar Apr 11 '24 23:04 borchero

I think we'd want to add this at the level of the C API and keep the logic on the Python side as minimal as possible.

I agree. I can help to implement this feature in the C API.

shiyu1994 avatar Apr 15 '24 16:04 shiyu1994