lmfit-py Add Pearson4 fit model

Add Pearson4 fit model

Open lellid opened this issue 1 year ago • 4 comments

Description

This adds the Pearson4 fit model, which is a skewed version of Pearson7.

Type of Changes

[ ] Bug fix
[x] New feature
[ ] Refactoring / maintenance
[ ] Documentation / examples

Tested on

Python: 3.9.12 (main, Apr 4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)] lmfit: 0.0.post2626+gab21f84, scipy: 1.7.3, numpy: 1.21.5, asteval: 0.9.27, uncertainties: 3.1.7

Verification

Have you

[x] included docstrings that follow PEP 257?

[x] verified that existing tests pass locally?

[x] verified that the documentation builds locally?

[x] squashed/minimized your commits and written descriptive commit messages?

[ ] added or updated existing tests to cover the changes? (not neccessary, new model))
[X] updated the documentation and/or added an entry to the release notes (doc/whatsnew.rst)?
[ ] added an example? (not neccessary, only a model)

Aug 05 '22 13:08 lellid

@lellid Thanks - I think this would be a valuable addition.

At first glance, I also think it sort of needs some work - I don't think too much. The first thing I notice is some confusion about "height" and "amplitude". This seems like a source of an endless source of confusion, but we are (maybe only I am) trying to take a firm and clear stand.

In all of the other lmfit models (or at least "to the best of our abilities"), the term "amplitude" means a multiplicative factor for the unit-normalized lineshape. It is the area under the curve. The term "height" refers to the maximum value the function would take, typically at or least near the "center". For most of the models, we try to report the value (or a sensible estimate) of "height" and "fwhm" derived from the values of "amplitude" and "sigma".

The function arguments should be "amplitude", "center", and "sigma" - this helps make it easy to switch lineshape functions. If other parameters are needed, they should be listed after those.

If there are conflicting definitions, we tend to prefer the definition at Wikipedia, not because it is correct but because it is the most discoverable and will remain that way for the foreseeable future. So, I would guess that Pearson-IV should use something like (sorry for attaching an image, the Wikipedia links and equation numbers seem to be missing).

with lambda being replaced by center, and alpha being replaced by sigma, and with m and nu being new parameters. I have no idea whether default values can be used for these. Wikipedia suggests that this is normalized, though I have to admit that is not obvious to me! Anyway, I would guess that this is the most complete formula to use, and that amplitude would just be a scaling factor to that.

Does that seem sensible to you?

I also have no idea how to estimate "height" or "fwhm" or how to guess the sigma or amplitude parameter, let alone m and nu!

Thanks!

Aug 05 '22 16:08 newville

Hi,

thank you for commenting this. I did not confuse height and amplitude. In the provided function, the (primary) height parameter is really the maximum function value, and the amplitude is provided as calculated parameter, so that the area under the function would be available to the user. I would have named it area, but I used the convention all the other functions are using.

I decided to do that, because in the version where the amplitude is the primary parameter, the prefactor includes the calculating of a complex gamma function, another gamma function and a beta function, which seems rather expensive, since you have to do it for every point, and for calculation of the Jacobian. But of course, beside this, I can simply put the prefactor in the function.

The other thing is that the original function, which you showed above, has its maximum not at the position lambda, but elsewhere, which makes it a little awkward when you try to fit peaks. That's why I used a shifted version, which has its maximum position exactly at lambda.

I'm willing to add the Wikipedia version as Pearson4, but would rather like to add the shifted height version too, because it is much better suited for fitting peaks, and calculates much faster. Then which name should I give the shifted version?

Aug 05 '22 20:08 lellid

@lellid

I did not confuse height and amplitude. In the provided function, the (primary) height parameter is really the maximum function value, and the amplitude is provided as calculated parameter, so that the area under the function would be available to the user. I would have named it area, but I used the convention all the other functions are using.

Well, the second argument of the function is called and means "height". It is not called and does not mean "amplitude". That is different from somewhere between 15 and 20 existing peak-like functions in the lmfit.lineshapes module. It would be the only one in which "height" was a named argument.

I decided to do that, because in the version where the amplitude is the primary parameter, the prefactor includes the calculating of a complex gamma function, another gamma function and a beta function, which seems rather expensive, since you have to do it for every point, and for calculation of the Jacobian. But of course, beside this, I can simply put the prefactor in the function.

Well, I appreciate that "you decided". But if we're going to include this in lmfit as what we can support and explain as useful to a range of people from different disciplines, I think we would prefer some justified and documented definition for a distribution function.

The gamma and beta functions for the scaling factors would have to be calculated for each function evaluation, not for each independent "x" point. I sort of doubt that would be slow compared to doing a least-squares fit, but I could be persuaded otherwise. Still, we do such normalization for other lineshapes so that they are (at least approximately) unit-normalized, which is helpful when applying to many different data ranges and scientific disciplines.

The other thing is that the original function, which you showed above, has its maximum not at the position lambda, but elsewhere, which makes it a little awkward when you try to fit peaks. That's why I used a shifted version, which has its maximum position exactly at lambda.

I'm willing to add the Wikipedia version as Pearson4, but would rather like to add the shifted height version too, because it is much better suited for fitting peaks, and calculates much faster. Then which name should I give the shifted version?

The idea for the PR is to add it as a built-in lineshape for lmfit. If someone else wants to use the Pearson IV function, they could look at the lmfit documentation and say, "Oh, great, it's already builtin". Will she be expecting your definition, the Wikipedia definition, or something else? I do not know -- I have no experience with this distribution or its variations. But, I know that we can at least explain "we use the Wikipedia definition, go take it up with the Wikipedia math/stats community".

So, if your definition is what you want, go ahead and write your own, and call it whatever you want - I have no complaints about that at all. If the idea is that someone else is going to come across this and use it a few years from now, let's opt for something that might be justifiable as "the standard definition".

Aug 05 '22 23:08 newville

OK, then let's implement the standard version. I will include then the two calculated parameters height and position.

Aug 06 '22 14:08 lellid

closing, as replaced by #800

Sep 04 '22 16:09 newville

lmfit-py lmfit-py copied to clipboard

Add Pearson4 fit model

Description

Type of Changes

Tested on

Verification

lmfit-py
lmfit-py copied to clipboard