pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: Fixed issue where rolling.kurt() calculations would be effected by values outside of scope

Open eicchen opened this issue 7 months ago • 1 comments

  • [x] closes #61416
  • [x] Tests added and passed if fixing a bug or adding a new feature
  • [x] All code checks passed.
  • [x] Added type annotations to new arguments/methods/functions.
  • [x] Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Might have found an unrelated issue when calculating kurtosis for numbers >1e6, but I'll have to look into it more and open an issue if that is the case.

eicchen avatar May 22 '25 22:05 eicchen

@mroeschke my PR hasn't been reviewed for a while now, just checking if it will be reviewed or if I should just close it.

(sorry if it's a bother, I know you guys probably all have a lot on your plates and I didn't know who to ping)

eicchen avatar Jun 10 '25 18:06 eicchen

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

github-actions[bot] avatar Jul 26 '25 00:07 github-actions[bot]

I see you've been working on your own PR, have you taken on things from this fix? Have been occupied with school work, so haven't had time to look til now. If not, I can still work on it, just lmk

eicchen avatar Oct 28 '25 21:10 eicchen

have you taken on things from this fix?

My approach differs a lot from yours, so no.

Alvaro-Kothe avatar Oct 28 '25 21:10 Alvaro-Kothe

More in the solution-sense. I saw a commit for outliers in window values on your PR so I wasn't sure if you've already started tackling the same issue

eicchen avatar Oct 28 '25 21:10 eicchen

Got it. I am checking for catastrophic cancellation when updating the 3rd central moment, as it's the most sensible of all. When this happens, I recompute the window.

Alvaro-Kothe avatar Oct 28 '25 21:10 Alvaro-Kothe

So should I still fix up this PR then?

eicchen avatar Oct 28 '25 21:10 eicchen

Honestly, I don't know. But I think that we should arrive to a general solution for numerical stability (algorithm-wise) to compute the rolling variance, skewness and kurtosis.

I don't know if my solution is good enough, or if your approach is better in terms of stability and performance.

Alvaro-Kothe avatar Oct 28 '25 21:10 Alvaro-Kothe

Is this issue with data precision limitations? It's been a minute. I did open an enhancement request for implementing double-double arithmetics so we can work with extremely large and small float64s without multiple people implementing different methods of dealing with numerical stability due to data type. What do you think? Issue: #62870

eicchen avatar Oct 28 '25 22:10 eicchen

Is this issue with data precision limitations?

Yes, most of the problems are related to arithmetic problems in floating point numbers. Using a more precise data type or stabler algorithms can mitigate some of these problems.

I did open an enhancement request for implementing double-double arithmetics

Seems good. But for now, it doesn't seem clear to me how it should be implemented and integrated to the existing functionality.

Alvaro-Kothe avatar Oct 28 '25 22:10 Alvaro-Kothe

I have two ideas:

  • Overload the function at runtime depending on if inputs have 14 digits of sigfig
  • Create a separate double-double Cython implementation so we can implement them as needed

Assuming that's what the question was about

eicchen avatar Oct 28 '25 22:10 eicchen