speed-up `randperm` by using our current `rand(1:n)`
And similarly for randcycle and shuffle.
We had a custom version of range generation for randperm, which was based on the ideas of our previous default range sampler SamplerRangeFast (generate k-bits integers using masking and reject out-of-range ones) and took advantage of the fact that randperm needs to generate rand(1:i) for i = 2:n.
But our current range sampler ("Nearly Division Less") is usually better than this hack, and makes these functions more readable. Typically, for array lengths < 2^20, the new version is faster, but gets slightly slower beyond 2^22.
Here are some speedups:
The slow down for big arrays seems fine to me, but I will see if I can find an easy workaround.
Fix #57771.
Revisiting this: I lost the original benchmark code, so I don't remember if it was using an explicit rng, or the implicit one. Here is a more comprehensive graph, which shows the same tendencies:
However, if #58089 would be merged, the changes here would seem to always be an improvement:
Does this close https://github.com/JuliaLang/julia/issues/57771 ?
Does this close https://github.com/JuliaLang/julia/issues/57771 ?
Yes.
The slow down for big arrays seems fine to me
me as well, given that the larger the shuffled array gets the more likely that the user will want to do it with a distributed algorithm anyway, like the hypothetical https://github.com/JuliaFolds2/OhMyThreads.jl/issues/139 (Rust impl here https://github.com/manpen/rip_shuffle)
Nice reference! But also, with #58089, I believe there won't actually be slowdowns anyway.
since https://github.com/JuliaLang/julia/pull/58089 is merged, is this basically ready besides the doctests?
The test failure looks unrelated, if that's the case I will merge in a couple of days.