bonsai icon indicating copy to clipboard operation
bonsai copied to clipboard

added support for case_weights

Open p-schaefer opened this issue 1 year ago • 14 comments

I noticed there wasn't much progress on integrating case_weights into lightgbm through bonsai so I've taken a crack at adding it. This is in reference to 65.

p-schaefer avatar Dec 14 '23 03:12 p-schaefer

Thanks for the review @simonpcouch. Note, I am also working on a method that allows some ability to customize validation data beyond just specifying a proportion of training data. I find this quite necessary for my work (since my data are not entirely independent - hence needing case weights as well), but I don't know if its robust enough to include in Bonsai proper.

p-schaefer avatar Dec 15 '23 20:12 p-schaefer

Thanks again for your work here. Given the lack of movement on this PR, I'm going to go ahead and close. :)

simonpcouch avatar Apr 07 '24 19:04 simonpcouch

I have found a need for this feature recently, so I may try to resurrect this pull request and finish cleaning it up sometime this week. Anything I should know besides your comments above, @simonpcouch ?

joranE avatar Apr 08 '24 13:04 joranE

That'd be wonderful, @joranE! Conceptually, the implementation here looks solid—following the current review comments would put this PR in a good spot.

simonpcouch avatar Apr 08 '24 13:04 simonpcouch

Sorry, this fell off my radar. I can finish addressing the comments next week, if that works for you.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Simon P. Couch @.> Sent: Monday, April 8, 2024 9:31:10 AM To: tidymodels/bonsai @.> Cc: Patrick Schaefer @.>; Author @.> Subject: Re: [tidymodels/bonsai] added support for case_weights (PR #72)

That'd be wonderful, @joranEhttps://github.com/joranE! Conceptually, the implementation here looks solid—following the current review comments would put this PR in a good spot.

— Reply to this email directly, view it on GitHubhttps://github.com/tidymodels/bonsai/pull/72#issuecomment-2042767586, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYA7SXU5VRFVBVDIZ322P3Y4KLZ5AVCNFSM6AAAAABAUFJUD6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSG43DONJYGY. You are receiving this because you authored the thread.Message ID: @.***>

p-schaefer avatar Apr 08 '24 13:04 p-schaefer

@p-schaefer Sounds good, I'll let you tackle it! If you find yourself pulled away from it again for any reason, just let me know.

joranE avatar Apr 08 '24 15:04 joranE

I've fixed the spacing issues, and dealt with the intermediate file from testthat. Should I submit a new PR?

p-schaefer avatar Apr 09 '24 00:04 p-schaefer

You're welcome to push to the same branch! I'll reopen this one.

simonpcouch avatar Apr 09 '24 13:04 simonpcouch

Sorry, forgot to rerun the tests and checks. Looks good to me now, let me know if there is anything else.

p-schaefer avatar Apr 09 '24 16:04 p-schaefer

Thanks for closing the loop on this @p-schaefer! Now I just need to figure out if there's an easy way to tweak this interface to allow for the use of the linear_tree parameter for piecewise linear trees, which "naturally" needs to be passed to lgb.Dataset() as well, instead of lgb.train(). 🙄

joranE avatar Apr 09 '24 17:04 joranE

No problem. It might be worth opening a new issue to discuss how to deal with linear_tree (if you want some help). I think we can probably come up with a suitably tidy workflow for dealing with arguments that should be passed to lgb.Dataset() vs lgb.train(). I've been thinking of playing around with linear_tree as well.

p-schaefer avatar Apr 09 '24 18:04 p-schaefer

I added a new issue here.

joranE avatar Apr 09 '24 18:04 joranE

Test failures are unrelated, feel free to ignore. :)

simonpcouch avatar Apr 09 '24 19:04 simonpcouch

@simonpcouch Was there anything else you need from me?

p-schaefer avatar Apr 12 '24 12:04 p-schaefer

Nice work, @p-schaefer! :)

simonpcouch avatar May 08 '24 16:05 simonpcouch