"segmentation fault" with huge loess stat_smooth (from plotnine)
I'm quite sure this is an issue of scikit-misc so I file it here. I ran into the following why doing plots with https://github.com/has2k1/plotnine.
#!/usr/bin/env python3
import numpy as np
import pandas as pd
from plotnine import *
time_int = np.array(range(30000))
time_float = np.linspace(0, 500, 30000)
values = np.random.randint(1, 1000, 30000)
df = pd.DataFrame({'time_int': time_int, 'time_float': time_float, 'values': values})
df.info()
plot1 = ggplot(df, aes(x='time_int', y='values')) \
+ stat_smooth(method='loess')
plot2 = ggplot(df, aes(x='time_float', y='values')) \
+ stat_smooth(method='loess')
# print(plot1) # gives 'out of memory'
print(plot2) # crashes with segfault
With print(plot1) this prints:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 3 columns):
time_float 30000 non-null float64
time_int 30000 non-null int64
values 30000 non-null int64
dtypes: float64(1), int64(2)
memory usage: 703.2 KB
[skmisc/loess/src/misc.c:34] Out of memory (7200000000 bytes)
With print(plot2) this prints:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 3 columns):
time_float 30000 non-null float64
time_int 30000 non-null int64
values 30000 non-null int64
dtypes: float64(1), int64(2)
memory usage: 703.2 KB
zsh: segmentation fault (core dumped) ./test.py
I understand and expect the "Out of memory" error given the size of the data; the loess algorithm is O(n^2) in memory. I do not expect a segfault, I think it is related to the low memory situation (probably unchecked malloc).
Both tests run on my system. But for 40000 rows, I get segfaults for both plots.
The "Out of memory" is absolutely expected. The bug I wanted to report is the segfault.
Now that I tested the above code again I get segfaults for both and can't manage to find a size where I just get "out of memory".
I noticed this while using plotnine in jupyter notebook. I had method set to loess and then increased the size of the dataframe. Suddenly the ipython kernel kept crashing when generating the plot. I think scikit-misc (or plotnine?) should catch this instead of crashing.
Yes, the segfaults cause the Jupyter kernel to crash.
Any update on this?