cub icon indicating copy to clipboard operation
cub copied to clipboard

Address issues with existing determinism guarantees

Open alliepiper opened this issue 2 years ago • 8 comments

Overview

Some cub::Device* algorithms are/were documented to be run-to-run deterministic, but the implementations no longer fulfill that guarantee. This has been a major pain point for several users who were relying on this behavior. Their code does not behave as expected, and they have no way of fixing this without writing their own kernels.

An additional facet of this issue is that the existing non-deterministic algorithms are significantly faster than the former deterministic implementations, and the regression was accidentally introduced years ago specifically to provide better performance. Fixing this involves a trade-off between fixing functionally-regressed code and introducing performance regressions.

Both deterministic and non-deterministic implementations must continue to be available through public API for both sets of users.

Affected Algorithms

The following algorithms are known to violate current or former guarantees:

  • DeviceReduce::ReduceByKey is currently documented to provide run-to-run determinism, but doesn't.
  • DeviceScan::* used to provide the same broken guarantee, and the documentation was temporarily updated pending proper resolution of this issue.

These algorithms also provide such guarantees:

  • DeviceReduce::Reduce (and derived Sum, Min, Max, etc)
  • DeviceSegmentedReduce::Reduce

Requirements

  • Add explicit determinism tests for all affected algorithms that do/did provide determinism guarantees.
  • Update any algorithms that fail the new tests so that the deterministic behavior is restored.
  • Expose both the existing non-deterministic and the updated deterministic implementations via public API.
  • Document variants clearly.
  • Run new determinism tests in CI to guard against future regressions.

Open Questions

How to handle naming?

How should we spell the deterministic and non-deterministic APIs? The choice is between backward compatibility or a consistent, sensible API. I'm interested in hearing user feedback on which solution would be preferable:

Option 1: Prioritize backwards compatibility.

All existing algorithm names that promise(d) determinism have determinism restored and preserved. Non-deterministic implementations of these algorithms will be introduced under a new name, probably prefixed/suffixed (suggestions welcome). New deterministic APIs of existing non-deterministic algorithms will be introduced under a new name with a different prefix/suffix.

Pros:

  • Fixes functional regressions.

Cons:

  • Introduces performance regressions.
  • Inconsistent naming: The "basic" algorithm spelling will be deterministic for some algorithms but non-deterministic for others. Some algorithms will be suffixed when deterministic, other will be suffixed when non-deterministic.

Option 2: Prioritize consistency.

Algorithm implementations with no special guarantees are exposed through the basic spelling of the algorithm (e.g. Reduce).

Algorithms with determinism guarantees are prefixed (e.g. DeviceDeterministicReduce for run-to-run determinism on the same device, DeterministicReduce for consistently deterministic in all cases).

Pros:

  • Naming is consistent.
  • Matches how special requirements and guarantees are handled in other APIs (e.g. DeviceMergeSort has SortKeys, StableSortKeys, and SortKeysCopy).

Cons:

  • Introduces functional regressions as determinism guarantees on existing APIs are broken.

Scope

This issue is limited to testing and fixing pre-existing determinism guarantees in the explicitly listed algorithms. Requests for new or stronger determinism guarantees should go into new issues.

alliepiper avatar May 06 '22 21:05 alliepiper

I prefer option 2. The people who were depending on the determinism are already upset, and I am guessing that they will be (mostly) satisfied with option 2 (because CUB developers are unlikely to make the same mistake again when the function has "Deterministic" in its name). If option 1 is chosen, then a whole new group of users (who care about performance more than determinism) will become upset. Option 2 will result in a happier community (on average, if not everyone individually) and is a better long-term solution.

dkolsen-pgi avatar May 07 '22 19:05 dkolsen-pgi

I also prefer option 2, and completely agree with the points raised above by @dkolsen-pgi.

maddyscientist avatar May 07 '22 20:05 maddyscientist

cc: @kmaehashi (CuPy switched to use CUB by default despite the reproducibility/determinism concern, cupy/cupy#4897)

leofang avatar May 07 '22 23:05 leofang

Glad to see this happening! Although not directly affected anymore, my preference would be Approach 2.

ekelsen avatar May 09 '22 15:05 ekelsen

I also prefer option 2.

duncanriach avatar May 09 '22 23:05 duncanriach

Option 2 :) thanks.

lilohuang avatar May 16 '22 07:05 lilohuang

Is HistogramEven currently deterministic or not?

bhack avatar Jul 10 '22 00:07 bhack

Is HistogramEven currently deterministic or not?

Yes, assuming that your histogram output is an integral type that is either unsigned or won't overflow at runtime.

alliepiper avatar Jul 25 '22 21:07 alliepiper