pybaselines
pybaselines copied to clipboard
aspls does not fit noisy data like its literature implementation
Description of the problem/new feature
As part of a larger effort aimed at adding validation tests for all algorithms, I have found that aspls does not fit noisy data in a manner that matches its paper implementation. I think my knee-jerk idea of reducing asymmetric_coef to 0.5 a couple years ago when I first noticed this, thinking that it was maybe meant as the reciprocal of what was written in the text, was a red herring to the actual fix. There's a new preprint for a similar algorithm called NasPLS with I think at least one of the same authors as the asPLS paper, which uses the same weighting as asPLS but with an asymmetric coefficient of 4 instead of 2. Their algorithm also fits noise just fine, so reducing asymmetric_coef must not have been the right solution.
Description of a possible solution or alternative
Looking at the asPLS paper, when they discuss finding an appropriate asymmetric coefficient value, they state: "According to PauTa criterion, if a signal is three-sigma from the estimated noise mean, we can consider this area as an absorption peak region". Further, in the NasPLS preprint, in step 2 of their algorithm, they say to calculate the mean of the negative residuals; if the mean is not used in the weighting, why calculate it?? To me, this implies in both papers that they are implicitly shifting the residuals by the mean of the negative residuals within the weighting, matching similar methods like arpls, drpls, etc. Thus, the weighting equation for aspls should be changed from:
$w = \frac {1} {1 + \exp{\left(\frac {k (r - \sigma^-)} {\sigma^-} \right)}}$
to
$w = \frac {1} {1 + \exp{\left(\frac {k (r + \mu^- - \sigma^-)} {\sigma^-} \right)}}$
With this change, aspls fits noisy data matching its paper implementation, as shown below (data mirrors Figures 5a and 5c from the asPLS paper), and the root mean square error for the fits is now lower than arpls, matching table 2 in the asPLS paper.