catch22
catch22 copied to clipboard
Segmentation Faults on Small Datasets
What Happened
Get a segmentation fault when running catch22.catch22_all
on short lists/numpy.arrays.
What I Expected to Happen
Returns a dictionary of features (perhaps with a lot of NaN-types due to the short timeseries)
Minimum Complete Verifable Example
>>> import catch22
>>> catch22.catch22_all([1,2])
Segmentation fault
Further Details
Can check that this doesn't happen with longer arrays like so:
import catch22
timeseries = list(range(10))
while timeseries:
print(len(timeseries))
catch22.catch22_all(timeseries)
timeseries = timeseries[:-1]
Thanks for the clear description. Would indeed be preferable to have the behavior you describe (for features that require a minimum length, adding an early catch for time-series length with a NaN output if under minimum length) would help avoid seg faults. @chlubba do you have bandwidth? Otherwise @hendersontrent might be able to help.
There is some code @chlubba wrote in main.c
to determine if code can be run in catch22 (lines 23-45 here), but the main.c
operations don't get used outside of raw C, so the wrappers (e.g., Python) are not performing this check. We could either add a check to each of the 22 feature functions, or add a single call within the core Python/Matlab/etc catch22_all()
function that does this prior to computing anything. Happy to consider an alternative if I have misunderstood.
Hi @evanharwin, @hendersontrent and @benfulcher,
what you, @hendersontrent, proposed, namely to add the check in the core wrapping functions, makes the most sense to me.
I could find the time to do this in ~2 weeks. If you are able to add this check earlier, thanks for doing so!
Best, Carl
I just ran each feature individually in Rcatch22
to test which ones error on a T = 2 time series, and CO_Embed2_Dist_tau_d_expfit_meandiff
was the only feature to do so... Maybe the fix only needs to go into the C code for that 1 feature @benfulcher?