Fluent-Random-Picker icon indicating copy to clipboard operation
Fluent-Random-Picker copied to clipboard

Try improving implementation of StochasticAcceptanceBasedWeightedLeftShuffle

Open ndsvw opened this issue 3 years ago • 2 comments

Question: How to pick values with different priorities prioritized?

The current implementation works with stochastic acceptance and runs in linear time on average (see StochasticAcceptanceBasedWeightedLeftShuffle.cs)

Problem: The max probability value is calculated once in the beginning. If there is one high probability (e.g. weight 10.000) and all the other values are low (e.g. weight 1), the while(true) loop for the stochastic acceptance is iterated very often for no reason.

But: Updating the max potentially n times makes the algorithm run in O(n^2).

Sorting the values in the beginning to have "all max values" for every step makes it O(n*log(n)), what's maybe not the best solution.

Possibly, there is a different approach that runs in O(n) and does not require max-tracking. Or maybe there is a good approach to realize max-tracking without too much additional complexity.

ndsvw avatar Jul 11 '21 13:07 ndsvw

Maybe, something like https://rcoh.me/posts/linear-time-median-finding/ or https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/ can be used.

Or maybe a fallback method if https://github.com/ndsvw/Fluent-Random-Picker/blob/1985c027a6435f1fefbba77ccf1c5dedc8b8fe69/FluentRandomPicker/Shuffle/PrioritizedLeftShuffle.cs#L73 is getting too low.

ndsvw avatar Dec 04 '21 08:12 ndsvw

For now: Switched to SortingBasedWeightedLeftShuffle with consistent O(n*log(n)) performance until there is a better solution.

ndsvw avatar Jun 04 '22 07:06 ndsvw

Staying with SortingBasedWeightedLeftShuffle...

Benchmark (all generated prios are between 1 and 4, number of elements = 8.000.000):

[MemoryDiagnoser]
public class PickDistinctWithPrioritiesBenchmark
{
    private static readonly System.Random _random = new();
    private static readonly Consumer _consumer = new();

    private static readonly int[] _fiftythousandElements = Enumerable.Range(0, 8_000_000).Select(x => _random.Next(1, 5)).ToArray();

    [Benchmark]
    [ArgumentsSource(nameof(Data))]
    public void PickDistinctWithPriorities(int[] array)
    {
        Out.Of().Values(array).WithWeights(array).PickDistinct(array.Length / 2).Consume(_consumer);
    }

    public IEnumerable<int[]> Data()
    {
        yield return _fiftythousandElements;
    }
}

Sotring based:

Method array Mean Error StdDev Gen0 Gen1 Gen2 Allocated
PickDistinctWithPriorities Int32[8000000] 5.604 s 0.0352 s 0.0312 s 189000.0000 27000.0000 6000.0000 1.1 GB

Stochastic acceptance based:

Method array Mean Error StdDev Gen0 Gen1 Gen2 Allocated
PickDistinctWithPriorities Int32[8000000] 5.060 s 0.0295 s 0.0276 s 188000.0000 27000.0000 6000.0000 1.04 GB

When dealing with so many elements, space becomes a bigger problem...

ndsvw avatar Mar 16 '24 16:03 ndsvw