nmrsim icon indicating copy to clipboard operation
nmrsim copied to clipboard

Added an alternative way to generate first-order multiplets with better scaling

Open RaphaelRobidas opened this issue 1 year ago • 3 comments

The current method of generating multiplets (splitting multiple times into doublets) scales exponentially with the multiplicity. An alternative way to generate multiplets was implemented using binomial coefficients. It's faster for high multiplicities (although slower for simple cases) while remaining pretty elegant. The code is not currently used by the other functions, I let you judge whether it would be useful to use elsewhere in the codebase.

Benchmark results:

Couplings: [(8, 2), (4, 2)]
Current: 0.0466 s for 10000 iterations
New: 0.1685 s for 10000 iterations

Couplings: [(8, 2), (4, 2), (7, 3), (1.5, 3)]
Current: 2.5646 s for 10000 iterations
New: 2.5093 s for 10000 iterations

Couplings: [(8, 8), (4, 5)]
Current: 21.5207 s for 10000 iterations
New: 0.8021 s for 10000 iterations

Couplings: [(13, 8), (7, 5)]
Current: 22.7828 s for 10000 iterations
New: 0.8319 s for 10000 iterations

Verification and benchmarking code:

import time

from nmrsim.firstorder import multiplet, binomial_multiplet

def equivalent(peak1, peak2):
    if len(peak1) != len(peak2):
        return False
    for p1, p2 in zip(peak1, peak2):
        for c1, c2 in zip(p1, p2):
            if abs(c1 - c2) > 1e-9:
                return False
    return True

for i in range(1, 10):
    for j in range(1, 10):
        ref = multiplet((300, 1), [(13, i), (7, j)])
        comp = binomial_multiplet((300, 1), [(13, i), (7, j)])
        assert equivalent(ref, comp)

N = 10000

for case in [
    [(8, 2), (4, 2)],
    [(8, 2), (4, 2), (7, 3), (1.5, 3)],
    [(8, 8), (4, 5)],
    [(13, 8), (7, 5)],
]:
    t1 = time.time()
    for i in range(N):
        multiplet((300, 1), case)
    t2 = time.time()
    print(case)

    print(f"Current: {t2-t1:.4f} s for {N} iterations")
    t1 = time.time()
    for i in range(N):
        binomial_multiplet((300, 1), case)
    t2 = time.time()

    print(f"New: {t2-t1:.4f} s for {N} iterations")
    print("")

I also noticed a slight confusion regarding the notation of the multiplicities. Currently, a coupling of the form (8, 2) will produce a triplet with J = 8 Hz, as it causes the signal to split twice. The docstring of the multiplet function previously indicated that (8, 2) would produce a doublet.

RaphaelRobidas avatar Dec 15 '23 16:12 RaphaelRobidas

Brief response for now (traveling for the holidays): Good catch on the documentation error. When I get back from vacation I'll review your new version of the function.

sametz avatar Dec 23 '23 04:12 sametz

Two things:

  1. The next version release will be Python 3.8+. That means math.comb from the standard library could be used instead of scipy. I'd like to minimize reliance on external libraries where convenient. Could you try implementing your solution with math.comb? See this link on Stack Overflow.

  2. How many splittings need there be before your code is faster than the current code? Is there a real-world example that can be used as a benchmark? For hydrocarbons, anything over 8 nuclei would be unusual, but my understanding is that for other spin-half nuclei such as phosphorous more complexity can be seen.

sametz avatar Jan 10 '24 15:01 sametz

I didn't know about math.comb, very nice.

As for the timing, I ran the benchmarks again and the new code is now as fast or faster as the current code:

[(8, 2)]
Current: 0.1899 s for 100000 iterations
New: 0.1985 s for 100000 iterations

[(8, 3)]
Current: 0.3185 s for 100000 iterations
New: 0.2371 s for 100000 iterations

[(8, 4)]
Current: 0.5507 s for 100000 iterations
New: 0.2804 s for 100000 iterations

[(8, 5)]
Current: 1.0068 s for 100000 iterations
New: 0.3223 s for 100000 iterations

[(8, 2), (4, 2)]
Current: 0.5759 s for 100000 iterations
New: 0.5650 s for 100000 iterations

[(8, 2), (4, 2), (7, 3), (1.5, 3)]
Current: 32.0979 s for 100000 iterations
New: 7.9310 s for 100000 iterations

It seems that the implementation in the standard library is a bit faster than the one in Scipy. Overall, the new code does not seems to have drawbacks now.

RaphaelRobidas avatar Jan 15 '24 16:01 RaphaelRobidas