PyEMD
PyEMD copied to clipboard
Cython support
Improving performance by migrating some of the code to Cython.
Maybe it's much easier (and even faster performance) by implementing Numba in the code. @laszukdawid @nescirem See: https://numba.pydata.org/numba-doc/latest/user/5minguide.html https://stackoverflow.com/questions/53170786/why-is-cython-so-much-slower-than-numba-when-iterating-over-numpy-arrays https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/
Hey everyone, Happy 4.5 years later!
Just wanted to mention that PyEMD 1.4.0 introduced JitEMD
(https://github.com/laszukdawid/PyEMD/pull/133). If there's interest in it then I'd likely do a bit more work on it.
I've been postponing work on it since I've tried Numba a few years back and it was extremely limited. Then life came in and PyEMD was significantly lower on priority. A few months ago there was a new wave of interest (tickets) so thought that it might be nice to do some extra work. Numba is now much better but JIT isn't still the best option here. It takes quite a long time to compile everything and most of the code was already using low level Numpy methods, i.e. written in C. But I realised that may of you use EMD in notebooks - import once and then reuse, in which case 20 s of compiling once and then saving 1 second on each usage might be beneficial.
I'd be curious to hear whether this helps.
I am new to PyEMD. Just a few weeks ago I googled on internet and tried to find the best "EMD" algorithm and see if I can do something with it in the future. After digging into a bit depth, I found this package is the most advanced one since it includes several different implementation, EMD, EEMD and CEEMDAN. I personally is more interested in CEEMDAN (however, I don't know much in deep theory and just get an impression that it is one of the best as of today). I tried it with a few tests and the only major issue for me is the speed. The following is the testing code:
import time
import numpy as np
import multiprocessing as mp
from PyEMD import CEEMDAN
def EMDecomp(iy, ix, ia):
return ceemdan(gather[iy, ix, ia], t, max_imf=3)
if __name__ == "__main__":
dat_file = "aaa.npy"
gather = np.load(dat_file)
print(f"{gather.shape = }") # gather.shape = (9, 9, 8, 151)
print(f"{gather.dtype = }") # gather.dtype = dtype('float32')
ny, nx, na, nt = gather.shape
t0 = 0.0
dt = 0.004
t = np.linspace(t0, t0 + (nt - 1) * dt, num=nt, dtype=np.float32)
t1 = time.time()
ceemdan = CEEMDAN(DTYPE=np.float32, extrema_detection="parabol", trials=20)
nproc = 16
pool = mp.Pool(processes=nproc)
for iy in range(ny):
itrmap = [(iy, ix, ia) for ix in range(nx) for ia in range(na)]
signal = np.array(pool.starmap(EMDecomp, itrmap), dtype=np.float32)
t2 = time.time()
print(f"\ntotal time: {t2 - t1:.2f} sec\n")
it takes about 40 seconds. I think it would be ideal if the time can be reduced by a factor of 100, because in my real cases, (nx * ny) is about 1000000; na is between 4 and 8, and "nt" is a bit longer, around 500. However, this may be difficult if using python.
Thanks again for your time and work.
It's been 5 years since the idea came about. I'm unlikely going to do it. I've added JIT support which should help in worst cases. I'm playing around with Jax so if it fits I might update it as well. However, Cython is unlikely to happen so I'm closing this ticket.