LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[RFC] make `deterministic` parameter more thorough?

Open jameslamb opened this issue 1 year ago • 6 comments

Proposal

When deterministic = true in parameters, other configuration values should be overwritten with values that will reduce randomness, even if those values make LightGBM slower or more resource-intensive.

Specifically, deterministic = true should result in:

  • force_row_wise = true
  • gpu_use_dp = true
  • seed = 708 (or some other fixed, positive, non-0 number)

Motivation

In my experience responding to issues here and on Stack Overflow, LightGBM's sources of non-determinism are often misunderstood. It'd be simpler (for both users and maintainers) if it was possible to get deterministic results by just setting {"deterministic": true}.

Examples of the discussions I'm referring to:

  • #6964
  • #6790
  • https://stackoverflow.com/questions/79212471/why-do-i-get-different-performance-on-different-runs-on-my-ml-model
  • #6683
  • #6671
  • #6604
  • #6203
  • #6094
  • #6069
  • https://github.com/microsoft/LightGBM/issues/5887#issuecomment-1545935847
  • #4722
  • #4562
  • #3761
  • #3654
  • #3372 (the PR that led to deterministic being introduced in #3494)
  • #2085

Description

There are plenty of implementation details that would need to be worked out, but in short I'm envisioning this updating would happen only on the C++ side, not in wrappers.

And that a new documentation section should be added (not in the deterministic parameter's part of https://lightgbm.readthedocs.io/en/latest/Parameters.html), explaining the limitations of this approach and how to further improve reproducibility.

Other Questions

How to handle multithreading with OpenMP?

Proposing we do the following:

  • audit all OpenMP pragmas and confirm that any where ordering matters (e.g. gradient accumulation) turn off parallelism when deterministic = true (as in #3494)
  • add test(s) that deterministic = true is sufficient to get completely deterministic results for the CPU version

But alternatively, we could do something simpler:

  • set num_threads = 1 whenever deterministic = true

Which is preferable?

Reviewers

@shiyu1994 @guolinke @StrikerRUS @borchero @jmoralez @btrotta I'd like to hear your thoughts on this. Thanks for your consideration!

jameslamb avatar Nov 27 '24 04:11 jameslamb

I generally like this! In my experience, it is always slightly annoying to get ML libraries to behave purely deterministically. Having this feature built-in would be pretty nice.

That being said, I find it slightly weird to fix seed to a certain value inside the library code. I don't see how to achieve the desired behavior without it though...

borchero avatar Nov 27 '24 20:11 borchero

I agree that setting deterministic = true should involve setting other required params for reproducibility implicitly. But I'm not sure which particular approach would be better...

StrikerRUS avatar Nov 28 '24 11:11 StrikerRUS

I'm not sure which particular approach would be better...

Does this comment just refer to the 2 different approaches I mentioned for dealing with multi-threading? If so, I think we should start with the other sources of non-determinism (like setting force_row_wise = true) and come back to multi-threading separately.

I find it slightly weird to fix seed to a certain value inside the library code

To be clear, I'm proposing that passing deterministic = true set seed = 708 (or some other non-0 value) only if params does not already set seed to some non-0 value.

That's the simplest way I can think of to achieve this behavior, and I strongly suspect that most users setting deterministic = true would not care about which particular random seed is used, just that the same one is used on every run.

jameslamb avatar Nov 29 '24 03:11 jameslamb

@jameslamb

Does this comment just refer to the 2 different approaches I mentioned for dealing with multi-threading?

Yes, my comment was about those ones.

StrikerRUS avatar Nov 29 '24 09:11 StrikerRUS

Fully agree with this proposal. But for CUDA version, we may still encounter nondeterministic issues due to nondeterministic order of atomic operations in adding up the histograms. So far, I have now idea to solve this case yet, without sacrificing too much training speed.

shiyu1994 avatar Dec 01 '24 17:12 shiyu1994

Makes sense to me, I definitely think it is acceptable for the CUDA version to have weaker guarantees about determinism in exchange for being much faster.

jameslamb avatar Dec 01 '24 23:12 jameslamb