python-flint icon indicating copy to clipboard operation
python-flint copied to clipboard

Question regarding computation of generalized harmonic numbers

Open michaelstepniczka opened this issue 9 months ago • 7 comments

In FLINT and python-flint, one has access to the harmonic numbers via fmpq.harmonic(n). When looking up the efficient computation of these numbers, I ran into a blog post by the current maintainer of FLINT! I see that the computation in FLINT is done via a balanced-sum, and I was wondering two things:

(1) is there a cache anywhere in the current computation/would it be beneficial to have one in practice? One could for instance define in python my_harmonic() to return the output of the fmpq.harmonic() function, while including e.g. a functools lru_cache before this.

(2) is there a way to access computations of the generalized harmonic numbers harmonic(n,m) such that harmonic(n,1) = harmonic(n)? Namely, $$H_{n,m}=\sum_{k=1}^{n}\frac{1}{k^m}$$, which pops up for instance when rewriting polylogarithms in terms of the Hurwitz zeta function.

I am looking for a way to efficiently compute these generalized harmonic numbers, but am initially unsure as to whether it would be beneficial to implement them from scratch in an analogous way for their own sake, or to use relations such as e.g. $$H_{n,m}=\sum_{k=1}^{n-1}\frac{H_{k,m-1}}{k(k+1)}+\frac{H_{n,m-1}}{n}$$ to reduce the order of the computation to eventually have $H_{n,m}$ written in terms of a sum consisting of $\sim nm$ terms of numbers $H_{1,1},\dots,H_{n,1}$ the ordinary harmonic numbers; therefore, if these numbers are cached, it seems that $H_{n,m}$ and $H_{n,1}$ would require roughly the same amount of computation.

I could be totally off the mark in which case I would appreciate any advice! For instance, the mere rewriting of the summation to decrease the order could be a non-significant computational step when compared to the naive computation itself, as we are again summing over various rational linear combinations. Thank you for any comments!

michaelstepniczka avatar Mar 04 '25 10:03 michaelstepniczka

These are not currently implemented in FLINT itself, but we could add them. Some version of a balanced sum would be best for computing standalone values. There's little reason to use a cache; if you want several values simultaneously, best compute them as a vector using recurrence relations.

fredrik-johansson avatar Mar 04 '25 11:03 fredrik-johansson

Thank you for the quick reply!

I would be very interested in seeing this implemented in FLINT and python-flint; I'm not the most experienced in C/C++, but perhaps this could be a task that's on the more straightforward end of things to do, and I could try giving it a go. Moreover, as I'm interested in the functionality, I would be interested in seeing this implemented sooner rather than later, and so I could try to push this forward! (I don't know how much of a learning curve there would be in determining how to best use the existing FLINT tools available). Alternatively, if this is straightforward enough and would be relatively quick to implement, I would love to see this done! (If this were implemented in FLINT, I also don't know how to get the functionality to be seen in python-flint).

Would you be able to elaborate on computing a collection of several values dependent on one-another as a vector using recurrence relations? Does this mean doing the hard computation once, and then adding/subtracting 1/n's as need be?

michaelstepniczka avatar Mar 04 '25 12:03 michaelstepniczka

If you want, say, $H_{n,m}, H_{n+1,m}, \ldots, H_{n+k,m}$, it would indeed be best to compute $H_{n,m}$ and then add $1/n^m$ terms for the rest. There might be some further optimizations available, especially if one starts from $n = 1$; it could be interesting to have a vector function this specific case in FLINT.

One can maybe do something a bit clever for $H_{n,m}$, $H_{n,m+1}$, $H_{n,m+2}$ too, and for the bivariate sequence as a matrix, but I'm not sure how often one needs those.

fredrik-johansson avatar Mar 04 '25 14:03 fredrik-johansson

Thank you for your advice!

Following the example in the documentation, I wrote an analogous python method for computing these generalized harmonic numbers (in exactly the same style -- nothing new added).

def gen_harmonic_odd_direct(a, b, n, m, d):
    t, v = 0, 1
    if a == 1:
        for k in range(b-1-(b%2), 0, -2):
            while k <= (n >> d):
                d += 1
            r = (2**(m*d)-2**(m*(d-1)))*(k**m)
            t, v = ((2**(d*m)-1)*v + r*t), r*v
        return t, v
    else:
        a += (a % 2 == 0)
        for k in range(a, b, 2):
            t, v = (v+(k**m)*t), (k**m)*v
        return (2**(d*m) - 1) * t, (2**(m*d)-2**(m*(d-1))) * v 

def gen_harmonic_odd_balanced(a, b, n, m, d):
    if b - a < 50: 
        return gen_harmonic_odd_direct(a, b, n, m, d)
    mid = (a+b) // 2 
    t, v = gen_harmonic_odd_balanced(a, mid, n, m, d + (a==1)) 
    u, w = gen_harmonic_odd_balanced(mid, b, n, m, d) 
    return t*w + u*v, v*w 

def gen_harmonic(n,m=1):
    return gen_harmonic_odd_balanced(1, n+1, n, m, 1)

This does indeed run a lot faster than what I was using, namely SymPy version 1.11.1. For instance:

SymPy 1.11.1:

start_time = time.time()
sp.harmonic(20000)
end_time = time.time()
print(end_time-start_time)
>>> 2.9713971614837646

Already implemented FLINT function:

start_time = time.time()
flint.fmpq.harmonic(20000)
end_time = time.time()
print(end_time-start_time)
>>> 0.01215672492980957

New function:

start_time = time.time()
gen_harmonic(20000)
end_time = time.time()
print(end_time-start_time)
>>> 0.020743370056152344

So we see that the FLINT implementation is indeed faster, but is roughly comparable to the native python implementation when compared to SymPy.

When actually generalizing to have $m\neq1$:

SymPy 1.11.1:

start_time = time.time()
sp.harmonic(20000,2)
end_time = time.time()
print(end_time-start_time)
>>> 8.803102970123291

New function:

start_time = time.time()
gen_harmonic(20000,2)
end_time = time.time()
print(end_time-start_time)
>>> 0.033841848373413086

Once again, we blow SymPy out of the water. Note that I am not using %timeit as SymPy does have a built in cache. However, as you mentioned, my use case is actually when I care about computing values of the harmonic function that are relatively near to each other in the first argument. For example:

SymPy 1.11.1

start_time = time.time()
[sp.harmonic(i,10) for i in range(5000)]
end_time = time.time()
print(end_time-start_time)
>>> 3.188575506210327

New function:

start_time = time.time()
[gen_harmonic(i,10) for i in range(5000)]
end_time = time.time()
print(end_time-start_time)
>>> 29.139833450317383

And so the caching of SymPy 1.11.1 prevails in this instance. For my purposes, I might secretly want [gen_harmonic_flint(5*i,10) for i in range(5000)], with some uniform spacing between the values, but they will be close enough in general. I see that on individual values, the implementation used in FLINT is better than SymPy; do you have advice on extending this towards the case where we care about consecutive values? For instance, could we compute the largest values first and work backwards by subtracting off terms $1/k^m$?

Rather, by a vector function for FLINT starting from $n=1$, do you mean a way to compute all the values $H_{1,m},...,H_{n,m}$? While FLINT is faster with these individual values than SymPy, are you suggesting to have an analogous method to SymPy where if you wish to cache all of the intermediate values, you just do the naive summation of $1/k^m$?

Thanks!

michaelstepniczka avatar Mar 05 '25 05:03 michaelstepniczka

The SymPy implementation here is particularly inefficient because it does not make use of FLINT/GMP for the integer/rational arithmetic and it is intended for symbolic inputs so it has a lot of overhead checking for other things rather than just computing the numeric result directly.

If you do the naive sum with python-flint it is a lot faster:

import flint

def harmonic_vector(n, m):
    """[harmonic(i, m) for i in range(n)]"""
    total = flint.fmpq(0)
    vec = [total]
    r = flint.fmpq(1)
    for i in range(1, n):
        total += r ** -m
        vec.append(total)
        r += 1
    return vec

Using gen_harmonic to get an individual value is faster than this but not by orders of magnitude in my timings.

oscarbenjamin avatar Mar 05 '25 13:03 oscarbenjamin

Thank you for the feedback.

Is there a way to implement FLINT functionality with general SymPy functions? I am interested in SymPy for doing some differentiation of gamma functions, but then wish to evaluate these functions, ultimately to exactly to obtain rational values. (In particular, with the polygamma functions, I am only looking at the rational part of the argument, which is given by gamma functions and these generalized harmonic numbers.)

With SymPy, I am using the subs() function, and with FLINT, I have tried looking at the timings directly for a specific value, as well as trying to implement a version of the subs() function to use FLINT functions in the evaluation. The direct evaluation in FLINT is faster than SymPy, but I must be making some conceptual error when trying to use FLINT to evaluate these SymPy expressions automatically.

I use SymPy to obtain an expression:

import sympy as sp

n = sp.symbols('n')
f = sp.gamma(5*n+1)/sp.gamma(n+1)
df = f.diff_rational(n) # I have not defined this in detail here
>>> df = -gamma(1 + 5*n)*harmonic(n)/gamma(1 + n) + 5*gamma(1 + 5*n)*harmonic(5*n)/gamma(1 + n)

Higher derivatives, which I am also interested in, will have harmonic numbers with $m>1$ which is why I was interested in this implementation in the above discussion.

I then wish to evaluate df at many values of $n$. If I run a value in python:

%timeit df.subs({n:2})
>>> 10.6 µs ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

And if I do the same thing in python-flint:

n = flint.fmpz(2)
%timeit (-flint.fmpz.fac_ui(1 + 5*n - 1)*harmonic_vector_cache(n)/flint.fmpz.fac_ui(1 + n -1 ) + 5*flint.fmpz.fac_ui(1 + 5*n-1)*harmonic_vector_cache(5*n)/flint.fmpz.fac_ui(1 + n -1))
>>> 1.7 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

and so I naively expect the ability to see a speedup of ~5x (at least at small values of evaluations of the harmonic function).

Note that I have slightly changed the harmonic_vector function as defined above to only output the desired number rather than the full vector, and I evaluate it from low to high as to not exceed any recursion depths:

@cache
def harmonic_vector_cache(n, m=1):
    if n == 0:
        return 0
    if m == 1:
        return flint.fmpq.harmonic(n)
    prev = harmonic_vector_cache(n-1,m)
    return flint.fmpq(n) ** -m + prev

However, I was looking to try to use python-flint to instead evaluate (in particular) these harmonic numbers. I was doing so by trying to manually replicate the sub function. I was able to take numerator = df.numerator() and denominator = df.denominator(), and loop through with code like:

def eval_harmonic_expr(expr,sub_dict):
    # there is more code here, but I've tried to trim it down to be understandable
    eval_num = flint.fmpz(0)
    expr_num,expr_den = expr.as_numer_denom()
    
    if expr_num.is_Add: # break down sums
        for summand in expr_num.args:
            eval_num += eval_harmonic_expr(summand,sub_dict)
            
    elif expr_num.is_Mul: # break down products
        eval_num = flint.fmpz(1)
        for fac in expr_num.args:
            eval_num *= eval_harmonic_expr(fac,sub_dict)

   # case for powers

    elif expr_num.is_Function:
        if isinstance(expr_num,sp.harmonic): # we have a harmonic function
            if len(expr_num.args) == 1:
                eval_num =  harmonic_vector_cache(int(expr_num.args[0].subs(sub_dict)))
            else:
                eval_num = harmonic_vector_cache(int(expr_num.args[0].subs(sub_dict)),int(expr_num.args[1].subs(sub_dict)))
        else: # we have a gamma function
            eval_num = flint.fmpz.fac_ui(int(expr_num.args[0].subs(sub_dict))-1)

    return eval_num/eval_den

Essentially, I am trying to find every instance of some expression, I have a dictionary for values of the arguments, and I wish to do the evaluation with python-flint rather than SymPy (or SymEngine). However, doing this, I am finding:

%timeit eval_harmonic_expr(df,{n:1})
>>> 48 µs ± 66.4 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I'm sorry if this is outside of the scope of what is normally asked via GitHub issues -- I truly appreciate all of the advice and help in learning how to properly use FLINT and python-flint effectively!

michaelstepniczka avatar Mar 07 '25 04:03 michaelstepniczka

I then wish to evaluate df at many values of n

With SymPy the usual way to do this is with lambdify which turns the expression into a callable function. You can instruct it to use any implementation of functions like harmonic that you supply.

oscarbenjamin avatar Mar 07 '25 18:03 oscarbenjamin