catch22 icon indicating copy to clipboard operation
catch22 copied to clipboard

Segmentation fault error when running python version

Open neezi opened this issue 5 years ago • 10 comments

I've been getting segmentation faults whenever I try to run the python version. For smaller data sets it works fine. Here I'm using a data set of size 100x10000. As soon as I go to 200x10000 it gets hung up. The iteration that it breaks on changes from attempt to attempt. Sometimes it breaks on iteration 150. Sometimes on 174. Seems like a memory problem?

neezi avatar Oct 08 '19 16:10 neezi

Thanks for opening this issue. What Python version are you running on what system? I'll look into it in the next days.

chlubba avatar Oct 08 '19 18:10 chlubba

Python 3.6 on Ubuntu 18.04.3 LTS. I placed an additional line took the absolute value of each vector and that seems to have removed the error.

neezi avatar Oct 08 '19 19:10 neezi

What worked for me was to inactivate the following feature:

CO_Embed2_Dist_tau_d_expfit_meandiff

I would expect depending on the dataset, some specific feature might be causing the segmentation fault.

GerardBCN avatar Feb 25 '20 16:02 GerardBCN

@GerardBCN Do you have an example of a specific time series that causes a segmentation fault when running this CO_Embed2_Dist_tau_d_expfit_meandiff? If so, we can build in better handling of these cases. We had no issues with the >100k time series we tested this on, right, @chlubba ? So it would be interesting to see what sort of time-series structures cause the major issue.

benfulcher avatar Feb 26 '20 01:02 benfulcher

Yes, on our time series we did not get segmentation faults. @GerardBCN, as @benfulcher said it would be very helpful for us if you had an example of a time series where the error occurs. Thanks in advance!

chlubba avatar Feb 28 '20 09:02 chlubba

@chlubba @benfulcher There seems to be some memory leakage, which may explain the segmentation fault. Here's a simple example:

for _ in range(1000000):
    catch22_all(np.random.randn(1000))

Run the above and see how your memory steadily increases. I run this in IPython and even after I hit Ctrl+C to stop the loop, the accumulated memory was still there. It wasn't until I exited IPython that my memory went down back to normal.

chanshing avatar Jan 15 '21 18:01 chanshing

The above is not a far-fetched example: It is common to break a long timeseries into small windows and do feature extraction in each window -- in other words, a rolling window feature extraction. This is what I am doing and that's how I encountered this issue.

chanshing avatar Jan 19 '21 11:01 chanshing

Hi @chanshing, thanks for pointing to the issue of memory consumption. Which wrapper are you using (Python, Matlab, R)?

This issue here opened by @neezi is about a segmentation fault. So slightly different topic. But both are related to memory management, I agree.

chlubba avatar Jan 19 '21 12:01 chlubba

Hi @chlubba I'm using the Python wrapper. The full example above would be:

import numpy as np
from catch22 import catch22_all

for _ in range(1000000):
    catch22_all(np.random.randn(1000))

When running the above code long enough, I get the mentioned segmentation fault. That's why I suspect it has to do with the memory leakage.

chanshing avatar Jan 19 '21 13:01 chanshing

Hi @chlubba. I confirm there's a memory leak using the R wrapper. It can be checked with the following code:

library(catch22)

for(ii in 1:1E6) {
  catch22::catch22_all(rnorm(1000))
}

quesadagranja avatar May 28 '21 11:05 quesadagranja