Combinatorics.jl
Combinatorics.jl copied to clipboard
Slow permutations
permutations() seems slower than it should be. Below is the example I discovered this with.
I have an array of strings, of length ~2,000. I would like to produce all permutations of lengths 1 to n (n in my application will always be less than or equal to 2).
My attempt is
import Combinatorics
"Return all permutations of sizes 1:`max_length`"
function power_permutations(iterable, max_length)
Iterators.flatten(Combinatorics.permutations(iterable, r) for r in 1:max_length)
end
and simply running through it we get
using BenchmarkTools
@benchmark foreach(identity, power_permutations(collect(1:2_000), 2))
BenchmarkTools.Trial:
memory estimate: 60.80 GiB
allocs estimate: 20000015
--------------
minimum time: 17.592 s (36.62% GC)
median time: 17.592 s (36.62% GC)
mean time: 17.592 s (36.62% GC)
maximum time: 17.592 s (36.62% GC)
--------------
samples: 1
evals/sample: 1
Woah, that is a lot of memory.
Comparing to a similar version in Python.
import itertools
def power_permutations(iterable, max_length):
"Return all permutations of sizes 1:`max_length`"
for r in range(max_length + 1):
yield from itertools.permutations(iterable, r)
%timeit len(list(power_permutations(list(range(2000)), 2)))
484 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
A good deal faster than the Julia version.
With arrays of strings. In Julia:
@benchmark foreach(identity, power_permutations(collect(Iterators.repeated("hello",2_000)), 2))
BenchmarkTools.Trial:
memory estimate: 60.80 GiB
allocs estimate: 20000015
--------------
minimum time: 17.404 s (36.79% GC)
median time: 17.404 s (36.79% GC)
mean time: 17.404 s (36.79% GC)
maximum time: 17.404 s (36.79% GC)
--------------
samples: 1
evals/sample: 1
And in Python:
%timeit len(list(power_permutations(["hello"]*2000, 2)))
473 ms ± 3.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
We see more or less the same performance.
rdeits suggested the following as a possible replacement
there is also this: https://github.com/JuliaLang/Microbenchmarks/issues/14 I (and others) implemented for crawling over all permutations (with on allocations).
I think a faster solution would be valuable. The idea of using a library for this suggests that one expects high performance and doesn't want to implement it herself/himself. As there are solutions pointed out here for this problem, is the only thing that needs to be done a PR with this?
Fixed by #122