pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: roll_skew and roll_kurt compute different result when input same data with diffrent length

Open cutebomb opened this issue 2 years ago • 5 comments

Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

a = np.random.random(20)*1000
a1 = a[:10]
a2 = a[:12]
df = pd.Series(a1)
df2 = pd.Series(a2)
r1 = df.rolling(8, min_periods=1).skew()
r2 = df2.rolling(8, min_periods=1).skew()
print(np.where(r1.values[:10] == r2.values[:10]))

r1 = df.rolling(8, min_periods=1).kurt()
r2 = df2.rolling(8, min_periods=1).kurt()
print(np.where(r1.values[:10] == r2.values[:10]))

Issue Description

rolling skew and kurt compute different result when input same data with diffrent length. With the rolling window of exactly same data, I think it is wired to have different result right? It is basically caused by minus mean value from the series, in source tree pandas/_libs/window/aggregations.pyx:621-624 and same code exist in roll_kurt too.

Expected Behavior

rolling skew and kurt result should be same.

Installed Versions

INSTALLED VERSIONS

commit : 0f437949513225922d851e9581723d82120684a6 python : 3.10.7.final.0 python-bits : 64 OS : Linux OS-release : 5.19.0-46-generic Version : #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 16 13:30:11 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.0.3 numpy : 1.23.1 pytz : 2023.3 dateutil : 2.8.2 setuptools : 59.6.0 pip : 23.2.1 Cython : 0.29.34 pytest : 7.3.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.1.2 lxml.etree : 4.9.2 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.13.2 pandas_datareader: None bs4 : 4.12.2 bottleneck : None brotli : None fastparquet : None fsspec : 2023.5.0 gcsfs : None matplotlib : 3.7.1 numba : 0.57.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 12.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : 2.0.16 tables : None tabulate : 0.9.0 xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None

cutebomb avatar Aug 03 '23 03:08 cutebomb

Hi, thanks for your report. Are you sure that this is not caused by a loss of precision? Your values are quite large for these algorithms

phofl avatar Aug 06 '23 12:08 phofl

I don't think it is caused by loss of precision, and value (-1000, 1000) is not very large value,is it? The rolling algorithm nature should keep the old (hitorical) value stable with new values. my temp fix to this problem is this commit: (by removing value minus mean code) https://github.com/cutebomb/pandas/commit/c5391d4d1b9e9d85f363d2b6519af32a59b756f6

Hi, thanks for your report. Are you sure that this is not caused by a loss of precision? Your values are quite large for these algorithms

cutebomb avatar Aug 10 '23 01:08 cutebomb

Hi, I am running into the same issue currently. Any ideas?

MattDXB avatar Feb 21 '24 11:02 MattDXB

Hi, I am running into the same issue currently. Any ideas?

check my patch for temp solution :)

cutebomb avatar Mar 11 '24 15:03 cutebomb

Yes, I saw it, thanks! The issue is that I'm in a company setting and I cannot recompile Cython code...

MattDXB avatar Mar 12 '24 08:03 MattDXB

The results for Skew are correct with #62863. Kurtosis still gives different results, but not by a large margin.

Alvaro-Kothe avatar Nov 06 '25 22:11 Alvaro-Kothe