Fluent-Random-Picker
Fluent-Random-Picker copied to clipboard
Try improving implementation of StochasticAcceptanceBasedWeightedLeftShuffle
Question: How to pick values with different priorities prioritized?
The current implementation works with stochastic acceptance and runs in linear time on average (see StochasticAcceptanceBasedWeightedLeftShuffle.cs)
Problem: The max probability value is calculated once in the beginning. If there is one high probability (e.g. weight 10.000) and all the other values are low (e.g. weight 1), the while(true)
loop for the stochastic acceptance is iterated very often for no reason.
But: Updating the max potentially n times makes the algorithm run in O(n^2).
Sorting the values in the beginning to have "all max values" for every step makes it O(n*log(n)), what's maybe not the best solution.
Possibly, there is a different approach that runs in O(n) and does not require max-tracking. Or maybe there is a good approach to realize max-tracking without too much additional complexity.
Maybe, something like https://rcoh.me/posts/linear-time-median-finding/ or https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/ can be used.
Or maybe a fallback method if https://github.com/ndsvw/Fluent-Random-Picker/blob/1985c027a6435f1fefbba77ccf1c5dedc8b8fe69/FluentRandomPicker/Shuffle/PrioritizedLeftShuffle.cs#L73 is getting too low.
For now: Switched to SortingBasedWeightedLeftShuffle with consistent O(n*log(n)) performance until there is a better solution.
Staying with SortingBasedWeightedLeftShuffle...
Benchmark (all generated prios are between 1 and 4, number of elements = 8.000.000):
[MemoryDiagnoser]
public class PickDistinctWithPrioritiesBenchmark
{
private static readonly System.Random _random = new();
private static readonly Consumer _consumer = new();
private static readonly int[] _fiftythousandElements = Enumerable.Range(0, 8_000_000).Select(x => _random.Next(1, 5)).ToArray();
[Benchmark]
[ArgumentsSource(nameof(Data))]
public void PickDistinctWithPriorities(int[] array)
{
Out.Of().Values(array).WithWeights(array).PickDistinct(array.Length / 2).Consume(_consumer);
}
public IEnumerable<int[]> Data()
{
yield return _fiftythousandElements;
}
}
Sotring based:
Method | array | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|
PickDistinctWithPriorities | Int32[8000000] | 5.604 s | 0.0352 s | 0.0312 s | 189000.0000 | 27000.0000 | 6000.0000 | 1.1 GB |
Stochastic acceptance based:
Method | array | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|
PickDistinctWithPriorities | Int32[8000000] | 5.060 s | 0.0295 s | 0.0276 s | 188000.0000 | 27000.0000 | 6000.0000 | 1.04 GB |
When dealing with so many elements, space becomes a bigger problem...