catch22 icon indicating copy to clipboard operation
catch22 copied to clipboard

Segmentation Faults on Small Datasets

Open evanharwin opened this issue 3 years ago • 4 comments

What Happened

Get a segmentation fault when running catch22.catch22_all on short lists/numpy.arrays.

What I Expected to Happen

Returns a dictionary of features (perhaps with a lot of NaN-types due to the short timeseries)

Minimum Complete Verifable Example

>>> import catch22
>>> catch22.catch22_all([1,2])
Segmentation fault

Further Details

Can check that this doesn't happen with longer arrays like so:

import catch22
timeseries = list(range(10))
while timeseries:
    print(len(timeseries))
    catch22.catch22_all(timeseries)
    timeseries = timeseries[:-1]

evanharwin avatar Oct 29 '21 09:10 evanharwin

Thanks for the clear description. Would indeed be preferable to have the behavior you describe (for features that require a minimum length, adding an early catch for time-series length with a NaN output if under minimum length) would help avoid seg faults. @chlubba do you have bandwidth? Otherwise @hendersontrent might be able to help.

benfulcher avatar Nov 01 '21 22:11 benfulcher

There is some code @chlubba wrote in main.c to determine if code can be run in catch22 (lines 23-45 here), but the main.c operations don't get used outside of raw C, so the wrappers (e.g., Python) are not performing this check. We could either add a check to each of the 22 feature functions, or add a single call within the core Python/Matlab/etc catch22_all() function that does this prior to computing anything. Happy to consider an alternative if I have misunderstood.

hendersontrent avatar Nov 01 '21 23:11 hendersontrent

Hi @evanharwin, @hendersontrent and @benfulcher,

what you, @hendersontrent, proposed, namely to add the check in the core wrapping functions, makes the most sense to me.

I could find the time to do this in ~2 weeks. If you are able to add this check earlier, thanks for doing so!

Best, Carl

chlubba avatar Nov 02 '21 10:11 chlubba

I just ran each feature individually in Rcatch22 to test which ones error on a T = 2 time series, and CO_Embed2_Dist_tau_d_expfit_meandiff was the only feature to do so... Maybe the fix only needs to go into the C code for that 1 feature @benfulcher?

hendersontrent avatar Nov 08 '21 00:11 hendersontrent