Combinatorics.jl icon indicating copy to clipboard operation
Combinatorics.jl copied to clipboard

Slow permutations

Open natemcintosh opened this issue 5 years ago • 3 comments

permutations() seems slower than it should be. Below is the example I discovered this with.

I have an array of strings, of length ~2,000. I would like to produce all permutations of lengths 1 to n (n in my application will always be less than or equal to 2).

My attempt is

import Combinatorics

"Return all permutations of sizes 1:`max_length`"
function power_permutations(iterable, max_length)    
    Iterators.flatten(Combinatorics.permutations(iterable, r) for r in 1:max_length)
end

and simply running through it we get

using BenchmarkTools

@benchmark foreach(identity, power_permutations(collect(1:2_000), 2))

BenchmarkTools.Trial: 
  memory estimate:  60.80 GiB
  allocs estimate:  20000015
  --------------
  minimum time:     17.592 s (36.62% GC)
  median time:      17.592 s (36.62% GC)
  mean time:        17.592 s (36.62% GC)
  maximum time:     17.592 s (36.62% GC)
  --------------
  samples:          1
  evals/sample:     1

Woah, that is a lot of memory.

Comparing to a similar version in Python.

import itertools

def power_permutations(iterable, max_length):
    "Return all permutations of sizes 1:`max_length`"
    for r in range(max_length + 1):
        yield from itertools.permutations(iterable, r)

%timeit len(list(power_permutations(list(range(2000)), 2)))                                                                                       
484 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

A good deal faster than the Julia version.

With arrays of strings. In Julia:

@benchmark foreach(identity, power_permutations(collect(Iterators.repeated("hello",2_000)), 2))
BenchmarkTools.Trial: 
  memory estimate:  60.80 GiB
  allocs estimate:  20000015
  --------------
  minimum time:     17.404 s (36.79% GC)
  median time:      17.404 s (36.79% GC)
  mean time:        17.404 s (36.79% GC)
  maximum time:     17.404 s (36.79% GC)
  --------------
  samples:          1
  evals/sample:     1

And in Python:

%timeit len(list(power_permutations(["hello"]*2000, 2)))                                                                                          
473 ms ± 3.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

We see more or less the same performance.

natemcintosh avatar May 14 '20 12:05 natemcintosh

rdeits suggested the following as a possible replacement

natemcintosh avatar May 14 '20 12:05 natemcintosh

there is also this: https://github.com/JuliaLang/Microbenchmarks/issues/14 I (and others) implemented for crawling over all permutations (with on allocations).

kalmarek avatar May 14 '20 13:05 kalmarek

I think a faster solution would be valuable. The idea of using a library for this suggests that one expects high performance and doesn't want to implement it herself/himself. As there are solutions pointed out here for this problem, is the only thing that needs to be done a PR with this?

Wikunia avatar Jul 01 '20 17:07 Wikunia

Fixed by #122

natemcintosh avatar Aug 23 '23 19:08 natemcintosh