lolo icon indicating copy to clipboard operation
lolo copied to clipboard

Splittable random numbers for reproducible training

Open bfolie opened this issue 3 years ago • 3 comments

Bagger and MultiTaskBagger both train the individual models in parallel. Because the order of training is uncontrolled, this means that Lolo random forests are inherently non-reproducible, even if the bagging and the rngs for base learners are identical.

There are ways of guaranteeing reproducibility across multiple threads, and we should make use of them. SplittableRandom in Java A discussion in the context of numpy

bfolie avatar Dec 17 '21 21:12 bfolie

Hi, how is it going? Is there any update on the issue? Thank you so much for a brief message in advance! Best, Christoph

iterateccvoelker avatar Sep 08 '22 15:09 iterateccvoelker

Thanks for asking @BAMcvoelker . To be honest we hadn't thought about it in a while, but after seeing your comment we realized we have all of the tools and just need to thread them through.

We open sourced our splittable random number library, which means it's available to pull into Lolo. I will pull it in soon and use it to make bagged training reproducible.

bfolie avatar Sep 16 '22 19:09 bfolie

Thank you so much @bfolie for the update and for picking up the topic again. I look forward to the update!

iterateccvoelker avatar Sep 20 '22 19:09 iterateccvoelker