lmfit-py icon indicating copy to clipboard operation
lmfit-py copied to clipboard

Add Pearson4 fit model (original Wikipedia version)

Open lellid opened this issue 2 years ago • 4 comments

Description

This adds the Pearson4 fit model (original version from Wikipedia), which is a skewed version of Pearson7.

Type of Changes
  • [ ] Bug fix
  • [x] New feature
  • [ ] Refactoring / maintenance
  • [ ] Documentation / examples
Tested on

Python: 3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)] lmfit: 0.0.post2625+g0b9947a.d20220808, scipy: 1.9.0, numpy: 1.23.1, asteval: 0.9.27, uncertainties: 3.1.7

Verification

Have you

  • [x] included docstrings that follow PEP 257?
  • [x] verified that existing tests pass locally?
  • [x] verified that the documentation builds locally?
  • [x] squashed/minimized your commits and written descriptive commit messages?
  • [ ] added or updated existing tests to cover the changes? (not neccessary, new model))
  • [X] updated the documentation and/or added an entry to the release notes (doc/whatsnew.rst)?
  • [ ] added an example? (not neccessary, only a model)

lellid avatar Aug 08 '22 14:08 lellid

@lellid Thanks! This looks like a very good start to me, and I'll try to take a more careful look at it. FWIW, I didn't understand the failures (files were modified by isort?). Maybe @reneeotten can comment on that...

newville avatar Aug 09 '22 21:08 newville

FWIW, I didn't understand the failures (files were modified by isort?). Maybe @reneeotten can comment on that...

I can, the changes/additions to import packages do not follow the guidelines of the pre-commit hooks we use. That's something that could be verified locally before submitting a PR, but in short the diff for lineshapes.py in the imports should read like:

-from numpy import (arctan, copysign, cos, exp, isclose, isnan, log, pi, real,
-                   sin, sqrt, where)
+from numpy import (arctan, copysign, cos, exp, isclose, isnan, log, log1p, pi,
+                   real, sin, sqrt, where)
+from scipy.special import betaln as betalnfcn
 from scipy.special import erf, erfc
 from scipy.special import gamma as gamfcn
+from scipy.special import loggamma as loggammafcn
 from scipy.special import wofz

reneeotten avatar Aug 10 '22 00:08 reneeotten

@reneeotten Ah, thanks -- "imports out of preferred order"!

newville avatar Aug 10 '22 00:08 newville

@reneeotten Ah, thanks -- "imports out of preferred order"!

well... the main thing is you're not supposed to have several "import ..... as ....." on one line as was done in this PR: from scipy.special import gamma as gamfcn, loggamma as loggammafcn, betaln as betalnfcn

reneeotten avatar Aug 10 '22 01:08 reneeotten

Hi,

I just updated this PR. I

  • added tests (code coverage is not complaining any more)
  • put the import statement in lineshapes.py on separate lines
  • improved the documentation in models.py
  • added Wiki link in lineshapes.py

lellid avatar Aug 15 '22 10:08 lellid

@lellid @reneeotten if I understand correctly, we're all in favor of merging this, except for some "not-quite-as-beautiful-as-we-like" import statements. Please fix those so we can merge.

newville avatar Sep 04 '22 16:09 newville

I used iSort to sort the using statements, so now it passes the tests.

lellid avatar Sep 05 '22 19:09 lellid

@lellid Thanks! I'll plan to merge in a couple of days to give time for final comments or suggestions.

newville avatar Sep 05 '22 19:09 newville

@lellid Thanks! I'll plan to merge in a couple of days to give time for final comments or suggestions.

thanks @lellid - for me the two comments about the docstrings (I left them again above) remain, after making these changes - or explaining why I'm wrong ;) - the commits should be squashed into one - then it's ready-to-go!

reneeotten avatar Sep 07 '22 02:09 reneeotten

@reneeotten @lellid Yes, let's please have consistent code and docstrings! I think we are happy to merge this, but we kind of need a version we can support, which does include being able to explain the actual definition used. We discussed this earlier at #799, but this PR was created as a separate PR after that discussion.

To reiterate, I strongly suggest following the Wikipedia definition or having a really clearly documented explanation of why that form was used instead of the Wikipedia definition.

newville avatar Sep 07 '22 04:09 newville

Hi,

I consolidated the duplication of the docstring.

To the other question about the Wikipedia definition and my implementation:

It is simply a numerical thing:

Of course I can use a naive implementation of $(1+arg^2)^{-m}$, but guess what happens if m is 1000 and arg is 10? Underflow. Or for instance, what is $Gamma(1000)$? Overflow! Finally, both terms together would give some reasonable result, but first, you get underflow multiplied by overflow.

So I logarithmize everything, and put all terms then in the $\exp$ function. For instance, $(1+arg^2)^{-m} = \exp(-m \times \log(1+arg^2)) = \exp(-m \times log1p(arg^2))$

I have seen other rather naive implementations in your code (for instance, Pearson7). Maybe, in the future, I will make some pull requests to correct that.

lellid avatar Sep 07 '22 13:09 lellid

I consolidated the duplication of the docstring.

thanks!

To the other question about the Wikipedia definition and my implementation:

It is simply a numerical thing:

Of course I can use a naive implementation of (1+arg2)−m, but guess what happens if m is 1000 and arg is 10? Underflow. Or for instance, what is Gamma(1000)? Overflow! Finally, both terms together would give some reasonable result, but first, you get underflow multiplied by overflow.

So I logarithmize everything, and put all terms then in the exp function. For instance, (1+arg2)−m=exp⁡(−m×log⁡(1+arg2))=exp⁡(−m×log1p(arg2))

That might be clear to you since you've probably stared at that equation for a while and looked at the math, but from a first read it wasn't to me. I also haven't checked that this is all implemented correctly, but I assume that you did... Of course all of this could be considered a "me problem" but perhaps you could just add a note to the function in lineshapes.py to mention it; just so that one doesn't need to go back to this PR in a few months/years to remember this ;)

I have seen other rather naive implementations in your code (for instance, Pearson7). Maybe, in the future, I will make some pull requests to correct that.

Well, naive or not it appears to have been working well for people for quite some time - of course improvements are always welcome.

Thanks again @lellid - this PR should be almost at the finish line.

reneeotten avatar Sep 08 '22 01:09 reneeotten

Hi,

what I was really wondering about was why you don't have any tests for the function values in your code? At least I haven't seen any. Would make those numerical tweakings a lot more secure, if you can be sure that the values come out that you expect.

lellid avatar Sep 08 '22 07:09 lellid

@lellid

I consolidated the duplication of the docstring.

OK, thanks. Well, at some point we might fix the capital "I" to "j" (Python's preferred sqrt(-1)).

To the other question about the Wikipedia definition and my implementation:

It is simply a numerical thing:

Of course I can use a naive implementation of , but guess what happens if m is 1000 and arg is 10? Underflow. Or for instance, what is ? Overflow! Finally, both terms together would give some reasonable result, but first, you get underflow multiplied by overflow.

So I logarithmize everything, and put all terms then in the function

These functions are kind of prone to under/overflow. I would not necessarily agree that it is obvious that taking log and then doing exp of that will help in general, but that's to say that it would not help here...

I have seen other rather naive implementations in your code (for instance, Pearson7). Maybe, in the future, I will make some pull requests to correct that.

Do you mean to use scipy.special.beta instead of using the equivalent with gamma? Yeah, that would be okay.

But, I'm not sure that the general approach of "work in log space and then take exponential" is necessarily worth the complication. Readability definitely counts.

newville avatar Sep 08 '22 15:09 newville

@lellid

what I was really wondering about was why you don't have any tests for the function values in your code?

Probably because we feel like we don't need those specifically.

The tests of fitting with these functions do test the values of parameters, so if the functions break, that will very likely be reflected in the fit results.

At least I haven't seen any. Would make those numerical tweakings a lot more secure, if you can be sure that the values come out that you expect.

Well, I guess we expect that we are not changing the lineshape definitions very much ;)

newville avatar Sep 08 '22 15:09 newville

@reneeotten @lellid I'm just going to squash and merge this. If there are more issues to resolve, they can be done afterward.

newville avatar Sep 16 '22 13:09 newville