LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[R-package] ensure use of interaction_constraints does not lead to features being ignored

Open mayer79 opened this issue 1 year ago • 4 comments

This enhances the R-API of interaction constraints by adding a feature group with those features that do not appear in any of the interaction groups. Currently, these are simply dropped from training, which seems undesirable.

Additionally, it reorganizes the code of the corresponding helper function .check_interaction_constraints().

It solves the R-package part of https://github.com/microsoft/LightGBM/issues/6376. I will attempt a separate PR for the Python-package.

Example

.check_interaction_constraints(list("a", 2:3), letters[1:5])

# Output
[[1]]
[1] "[0]"

[[2]]
[1] "[1,2]"

[[3]]
[1] "[3,4]"

Without the PR, the result is

[[1]]
[1] "[0]"

[[2]]
[1] "[1,2]"

i.e., the last two features are silently dropped from the training.

mayer79 avatar Mar 20 '24 15:03 mayer79

Getting a failing unit test:

  -- 1. Failure (test_lgb.Booster.R:209:5): Loading a Booster from a text file wor

  bst2$params[names(params)] not equal to `params`.

  Component "interaction_constraints": Length mismatch: comparison on first 2 components

I will have a look at it next week (afk).

mayer79 avatar Mar 21 '24 10:03 mayer79

Hey @mayer79 , across your recent PRs I've seen multiple "fix linting" types of commits. Totally fine to keep using Continuous Integration to get that feedback (we don't have a lot of activity going on in the repo right now), but you'd probably find it faster to run the linting locally. It only requires R and the {lintr} package.

Rscript .ci/lint_r_code.R $(pwd)/R-package

jameslamb avatar Mar 27 '24 14:03 jameslamb

Hey @mayer79 , across your recent PRs I've seen multiple "fix linting" types of commits. Totally fine to keep using Continuous Integration to get that feedback (we don't have a lot of activity going on in the repo right now), but you'd probably find it faster to run the linting locally. It only requires R and the {lintr} package.

Rscript .ci/lint_r_code.R $(pwd)/R-package

Thanks, this is the stuff that I should have asked long time ago, but never did :-).

mayer79 avatar Mar 27 '24 15:03 mayer79

Pipeline seems happy @jameslamb - but really no pressure :-)

mayer79 avatar May 14 '24 10:05 mayer79

Thanks very much for this! And I really appreciate you keeping it up to date with master while waiting for reviews.

I just left one suggestion about covering this a bit more thoroughly with tests. If you don't have time in the next week to get to that, let me know if you're open to me writing that test and pushing it here...I'm sorry for the time pressure (especially after not reviewing this for so long 😬 ), but I'm going to try to push for v4.4.0 to get out in the next week, ahead of the numpy 2.0 release (#6439 (comment)), and I'd love to get this change into it if we can.

Thanks so much for the review! I will try to improve the tests this week.

mayer79 avatar Jun 10 '24 05:06 mayer79

It seems that CI here is suffering from this issue: https://github.com/tox-dev/filelock/issues/337

(build link)

That issue was just discovered earlier today, and fixes for it are still rolling out. To keep making progress towards the release, I've just temporarily commented out running pre-commit in CI. I strongly suspect that 24 hours from now, that'll be resolved and we can re-enable it.

jameslamb avatar Jun 13 '24 03:06 jameslamb