nmrsim
nmrsim copied to clipboard
Added an alternative way to generate first-order multiplets with better scaling
The current method of generating multiplets (splitting multiple times into doublets) scales exponentially with the multiplicity. An alternative way to generate multiplets was implemented using binomial coefficients. It's faster for high multiplicities (although slower for simple cases) while remaining pretty elegant. The code is not currently used by the other functions, I let you judge whether it would be useful to use elsewhere in the codebase.
Benchmark results:
Couplings: [(8, 2), (4, 2)]
Current: 0.0466 s for 10000 iterations
New: 0.1685 s for 10000 iterations
Couplings: [(8, 2), (4, 2), (7, 3), (1.5, 3)]
Current: 2.5646 s for 10000 iterations
New: 2.5093 s for 10000 iterations
Couplings: [(8, 8), (4, 5)]
Current: 21.5207 s for 10000 iterations
New: 0.8021 s for 10000 iterations
Couplings: [(13, 8), (7, 5)]
Current: 22.7828 s for 10000 iterations
New: 0.8319 s for 10000 iterations
Verification and benchmarking code:
import time
from nmrsim.firstorder import multiplet, binomial_multiplet
def equivalent(peak1, peak2):
if len(peak1) != len(peak2):
return False
for p1, p2 in zip(peak1, peak2):
for c1, c2 in zip(p1, p2):
if abs(c1 - c2) > 1e-9:
return False
return True
for i in range(1, 10):
for j in range(1, 10):
ref = multiplet((300, 1), [(13, i), (7, j)])
comp = binomial_multiplet((300, 1), [(13, i), (7, j)])
assert equivalent(ref, comp)
N = 10000
for case in [
[(8, 2), (4, 2)],
[(8, 2), (4, 2), (7, 3), (1.5, 3)],
[(8, 8), (4, 5)],
[(13, 8), (7, 5)],
]:
t1 = time.time()
for i in range(N):
multiplet((300, 1), case)
t2 = time.time()
print(case)
print(f"Current: {t2-t1:.4f} s for {N} iterations")
t1 = time.time()
for i in range(N):
binomial_multiplet((300, 1), case)
t2 = time.time()
print(f"New: {t2-t1:.4f} s for {N} iterations")
print("")
I also noticed a slight confusion regarding the notation of the multiplicities. Currently, a coupling of the form (8, 2) will produce a triplet with J = 8 Hz, as it causes the signal to split twice. The docstring of the multiplet function previously indicated that (8, 2) would produce a doublet.
Brief response for now (traveling for the holidays): Good catch on the documentation error. When I get back from vacation I'll review your new version of the function.
Two things:
-
The next version release will be Python 3.8+. That means
math.combfrom the standard library could be used instead of scipy. I'd like to minimize reliance on external libraries where convenient. Could you try implementing your solution withmath.comb? See this link on Stack Overflow. -
How many splittings need there be before your code is faster than the current code? Is there a real-world example that can be used as a benchmark? For hydrocarbons, anything over 8 nuclei would be unusual, but my understanding is that for other spin-half nuclei such as phosphorous more complexity can be seen.
I didn't know about math.comb, very nice.
As for the timing, I ran the benchmarks again and the new code is now as fast or faster as the current code:
[(8, 2)]
Current: 0.1899 s for 100000 iterations
New: 0.1985 s for 100000 iterations
[(8, 3)]
Current: 0.3185 s for 100000 iterations
New: 0.2371 s for 100000 iterations
[(8, 4)]
Current: 0.5507 s for 100000 iterations
New: 0.2804 s for 100000 iterations
[(8, 5)]
Current: 1.0068 s for 100000 iterations
New: 0.3223 s for 100000 iterations
[(8, 2), (4, 2)]
Current: 0.5759 s for 100000 iterations
New: 0.5650 s for 100000 iterations
[(8, 2), (4, 2), (7, 3), (1.5, 3)]
Current: 32.0979 s for 100000 iterations
New: 7.9310 s for 100000 iterations
It seems that the implementation in the standard library is a bit faster than the one in Scipy. Overall, the new code does not seems to have drawbacks now.