statsmodels icon indicating copy to clipboard operation
statsmodels copied to clipboard

KDEUnivariate.fit with fft=True can crash for some inputs

Open has2k1 opened this issue 11 months ago • 10 comments

Describe the bug

KDEUnivariate(endog).fit(fft=True, cut=0) crashes for some unique values of endog.

Code Sample, a copy-pastable example if possible

import statsmodels.api as sm
x = [24.0, 43.0, 27.0, 15.0, 37.0, 8.82]  # bad
kde = sm.nonparametric.KDEUnivariate(x)
kde.fit(fft=True, cut=0)
python3(71738,0x1faed4840) malloc: Incorrect checksum for freed object 0x1170bcc00: probably modified after being freed.
Corrupt value: 0x3d30000000000000
python3(71738,0x1faed4840) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    71738 abort      ipython

Note that, the input that triggers the crash can be much longer; the issue isn't necessarily from the length.

The bug still exists in the main branch.

Expected Output

The fit method should run without crashing, just like it does for this slightly different input (last element of x is 8.8 instead of 8.82).

import statsmodels.api as sm
x = [24.0, 43.0, 27.0, 15.0, 37.0, 8.8] # good
kde = sm.nonparametric.KDEUnivariate(x)
kde.fit(fft=True, cut=0)

Output of import statsmodels.api as sm; sm.show_versions()

[paste the output of import statsmodels.api as sm; sm.show_versions() here below this line]

INSTALLED VERSIONS

Python: 3.13.2.final.0 OS: Darwin 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:22 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6041 arm64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8

statsmodels

Installed: 0.14.4 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/statsmodels)

Required Dependencies

cython: Not installed numpy: 2.2.4 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/numpy) scipy: 1.15.2 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/scipy) pandas: 2.2.3 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/pandas) dateutil: 2.9.0.post0 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/dateutil) patsy: 1.0.1 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/patsy)

Optional Dependencies

matplotlib: 3.10.1 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/matplotlib) backend: macosx cvxopt: Not installed joblib: 1.4.2 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/joblib)

Developer Tools

IPython: 9.0.1 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/IPython) jinja2: 3.1.6 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/jinja2) sphinx: 8.2.3 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/sphinx) pygments: 2.19.1 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/pygments) pytest: 8.3.5 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/pytest) virtualenv: 20.29.2 (/Users/hassan/.uvenv/statmodels-env/lib/python3.13/site-packages/virtualenv)

has2k1 avatar Apr 04 '25 16:04 has2k1

This uses numpy fft and a cython code in statsmodels.

I don't have a problem running the example in my python 3.11 so either numpy fft or our cython code does not work correctly.

Are there still cython bugs for Py 3.13 or does the Darwin version need recompiling with latest cython? Or does our linbin.pyx module have outdated code?

josef-pkt avatar Apr 04 '25 18:04 josef-pkt

No issues for me on Windows either suing Python 3.12.

bashtage avatar Apr 04 '25 18:04 bashtage

Also, I can't repro on Ubuntu 24.04.5 on Python 3.10.12.

bashtage avatar Apr 04 '25 18:04 bashtage

Or does our linbin.pyx module have outdated code?

Indeed the problem is at https://github.com/statsmodels/statsmodels/blob/a5b890fafca61b02eb185b7702a4c73c7b0cf1ab/statsmodels/nonparametric/kde.py#L593

but the cython looks innocuous.

https://github.com/statsmodels/statsmodels/blob/a5b890fafca61b02eb185b7702a4c73c7b0cf1ab/statsmodels/nonparametric/linbin.pyx#L15-L36

has2k1 avatar Apr 05 '25 11:04 has2k1

Does it matter if you call np.ascontiguousarray(x, dtype=float) before?

bashtage avatar Apr 05 '25 12:04 bashtage

Does it matter if you call np.ascontiguousarray(x, dtype=float) before?

No difference.

has2k1 avatar Apr 05 '25 14:04 has2k1

Plus. It seems like if x is bad then x + constant is always bad!

x = [24.0, 43.0, 27.0, 15.0, 37.0, 8.82]  # bad
x2 = np.array(x) + np.random.uniform(0, 100_000) # bad as well

has2k1 avatar Apr 05 '25 14:04 has2k1

How was your statsmodels version compiled? Is this a Darwin package or one of our pypi binaries or directly compiled?

josef-pkt avatar Apr 05 '25 17:04 josef-pkt

How was your statsmodels version compiled? Is this a Darwin package or one of our pypi binaries or directly compiled?

I get the error for both the PyPi binary and when compiled.

has2k1 avatar Apr 07 '25 11:04 has2k1

AHH. It's arm64. Almost certainly either a.compiler bug on arm (unlikely) or something wrong with the chythonization.

bashtage avatar Apr 07 '25 12:04 bashtage