Combinatorics.jl icon indicating copy to clipboard operation
Combinatorics.jl copied to clipboard

length(partitions(n)) and BigInt

Open jlapeyre opened this issue 8 years ago • 4 comments

If you change the type of the dict to Dict{BigInt,BigInt} in this line

https://github.com/JuliaMath/Combinatorics.jl/blob/master/src/partitions.jl#L58

then length(partitions(n)) returns correct results for larger values of n. The largest result that can be represented by Int64 is for n=405. If we have two routines and test whether n>405, and use either a big int or machine int Dict, then the results are faster for n<405, but not type stable. If we just always use a big int dict, then the function is typestable, but a bit slower for n<405.

jlapeyre avatar Dec 10 '16 22:12 jlapeyre

Another possibility: the return type of npartitions is the same as its input type. There are two separate Dicts, one for Int and one for BigInt.

jlapeyre avatar Dec 12 '16 10:12 jlapeyre

I'd prefer the second solution. We should also look into making the function thread safe.

andreasnoack avatar Dec 12 '16 14:12 andreasnoack

I agree, the second or third solutions are better: always bigint output, or output type is always the input type.

One more things; there is no reason to use Dict{BigInt,BigInt} rather than Dict{Int,BigInt}. The latter is much faster. Of course the key needs to be converted to Int in case the argument is BigInt.

jlapeyre avatar Dec 13 '16 12:12 jlapeyre

@jlapeyre In fact, instead of a Dict, the partition numbers can all be stored in a Vector, since using the pentagonal number recurrence to compute the n'th partition number will recursively compute the m'th partition number for every m<n anyway. Their's nothing to be gained and everything to be lost by hashing the keys, since they're dense. Knowing this we can also rewrite npartitions to work iteratively instead of recursively so all the recursive dependencies are automatically satisfied at every step, and then we don't need the haskey functionality of the Dict either.

I'm new to GitHub, and haven't figured out the pull request idea yet, but I'm quoting below the code I've been using locally.

let _npartitions = Vector{BigInt}()
    global npartitions
    function npartitions(n::T)::T where {T <: Integer}
        n<=0 && return n==1 ? 1 : 0
        n<=length(_npartitions) && return _npartitions[n] # value was previously computed

        # We haven't computed the table far enough yet.  Let's do it now.
        oldn = length(_npartitions)
        append!(_npartitions,zeros(BigInt,n-oldn))
        
        # The pentagonal numbers are k*(3*k-1)÷2 indexed by [1 -1 2 -2 3 -3 ...]
        # pgk after every while iteration is i-the k'th pentagonal number
        for i=oldn+1:n
            k = 1
            pgk = i-1
            while pgk > 0 # i.e. the k'th pentagonal number <= i
                _npartitions[i] = iseven(k) ?
                    _npartitions[i]-_npartitions[pgk] : _npartitions[i]+_npartitions[pgk]
                k < 0 ? k=-k+1 : k=-k
                pgk = i - (k*(3k-1))>>1
            end
            pgk == 0 && (_npartitions[i] -= (-1)^k)
        end
        return _npartitions[n]
    end
end

I'm also not sure how we're supposed to benchmark memoized functions, but including compile time this gives me the first 40,000 partition numbers in under 5 seconds, as opposed to longer than I've ever been willing to wait for the existing version.

malacroi avatar Dec 02 '20 10:12 malacroi