scikit-tree
scikit-tree copied to clipboard
Specifying dim_contiguous causes seg faults in PatchObliqueForestClassifier
Checklist
- [x] I have verified that the issue exists against the
mainbranch. - [ ] I have read the relevant section in the contribution guide on reporting bugs.
- [ ] I have checked the issues list for similar or identical bug reports.
- [ ] I have checked the pull requests list for existing proposed fixes.
- [ ] I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
- [ ] I have included in the "Description" section below a traceback from any exceptions related to this bug.
- [ ] I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
- [ ] I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
- [ ] I have included in the "Environment" section below the output of
pip freeze. - [x] I have included in the "Steps to reproduce" section below a minimally reproducible example.
Description
When there are non-contiguous dimensions, there are segmentation faults when fitting. See below for example.
Setting dim_contiguous=(True, True) runs fine, but dim_contiguous=(False, True) leads to seg faults.
Python traceback:
Related issues or possible duplicates
- None
Environment
OS: Ubuntu
Python version: 3.9.16
scikit-tree version: 0.6.1
Output of pip freeze:
Steps to reproduce
Example source:
import numpy as np
from sktree import PatchObliqueRandomForestClassifier
n, a, b = 100, 10, 10
x = np.random.normal(size=(n, a, b))
y = np.random.binomial(1, 0.5, size=(n))
rf = PatchObliqueRandomForestClassifier(
n_estimators=100,
min_patch_dims=[1, 1],
max_patch_dims=[4, 4],
dim_contiguous=(False, True),
data_dims=(a, b),
n_jobs=-1,
)
rf.fit(x.reshape(100, -1), y)
It seems this test works when n_estimators is low, but not when I ramp to 50 or maybe 100. This makes me think there is a sampling edge case error where we go out of the bounds of the image:
def test_contiguous_and_discontiguous_patch_forest():
"""Test regression reported in https://github.com/neurodata/scikit-tree/issues/215."""
n, a, b = 100, 10, 10
x = np.random.normal(size=(n, a, b))
y = np.random.binomial(1, 0.5, size=(n))
est = PatchObliqueRandomForestClassifier(
n_estimators=20,
min_patch_dims=[1, 1],
max_patch_dims=[4, 4],
dim_contiguous=(False, True),
data_dims=(a, b),
n_jobs=-1
)
est.fit(x.reshape(100, -1), y)
I think the issue is isolated to when dim_contiguous=False you can test this with a single tree iterated over 100-1000 different random seeds. When dim_contiguous=True , seems there is consistently no error.
It must occur in accessing an index that is not valid in one of two areas:
- https://github.com/neurodata/scikit-tree/blob/18c2f457b7fe35d3ab95fdd48e4212d325321b09/sktree/tree/manifold/_morf_splitter.pyx#L339-L350
- https://github.com/neurodata/scikit-tree/blob/18c2f457b7fe35d3ab95fdd48e4212d325321b09/sktree/tree/manifold/_morf_splitter.pyx#L375-L401