chemotools icon indicating copy to clipboard operation
chemotools copied to clipboard

Improve AirPLS and ArPLS performance - sparse matrix operations

Open paucablop opened this issue 9 months ago • 9 comments

Description

AirPLS (Adaptive Iteratively Reweighted Penalized Least Squares) and ArPLS (Asymmetrically Reweighted Penalized Least Squares) are powerful algorithms for removing complex non-linear baselines from spectral signals. However, their computational cost can be significant, especially when processing large numbers of spectra. Currently, we use the csc_matrix representation from scipy.sparse to optimize performance, but further improvements are needed.

Improving Attempts

To improve the performance, I have tried just-in-time compilation of some key functions using numba. However, numba does not support the csc_matrix type, and I cannot JIT compile the code. To overcome this issue, I thought of looking for a numba compatible representation of sparse matrices, but could not find one. Therefore, I have created my own, together with some functions to make basic algebra operations with them code to Gist. Unfortunately, this did not improve the performance over the current implementation.

Hacktoberfest Challenge

We invite open source developers to contribute to our project during Hacktoberfest. The goal is to improve the performance of both algorithms

Here are some ideas to work on:

  • Find a more efficient way to JIT compile the code using tools like numba.
  • Investigate parallel or distributed computing techniques to speed up the processing of multiple spectra.

How to Contribute

Here is the contributing guidelines

Contact

We can have the the conversation in the Issue or the Discussion

Resources

Here are some relevant resources and references for understanding the theory and implementation of the AirPLS and ArPLS algorithms:

  • Paper on AirPLS: Z.-M. Zhang, S. Chen, and Y.-Z. Liang, Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst 135 (5), 1138-1146 (2010).
  • Paper on ArPLS: Sung-June Baek, Aaron Park, Young-Jin Ahn, Jaebum Choo Baseline correction using asymmetrically reweighted penalized least squares smoothing

paucablop avatar Sep 30 '23 12:09 paucablop