scikit-tree icon indicating copy to clipboard operation
scikit-tree copied to clipboard

Specifying dim_contiguous causes seg faults in PatchObliqueForestClassifier

Open j1c opened this issue 1 year ago • 2 comments

Checklist

  • [x] I have verified that the issue exists against the main branch.
  • [ ] I have read the relevant section in the contribution guide on reporting bugs.
  • [ ] I have checked the issues list for similar or identical bug reports.
  • [ ] I have checked the pull requests list for existing proposed fixes.
  • [ ] I have checked the CHANGELOG and the commit log to find out if the bug was already fixed in the main branch.
  • [ ] I have included in the "Description" section below a traceback from any exceptions related to this bug.
  • [ ] I have included in the "Related issues or possible duplicates" section beloew all related issues and possible duplicate issues (If there are none, check this box anyway).
  • [ ] I have included in the "Environment" section below the name of the operating system and Python version that I was using when I discovered this bug.
  • [ ] I have included in the "Environment" section below the output of pip freeze.
  • [x] I have included in the "Steps to reproduce" section below a minimally reproducible example.

Description

When there are non-contiguous dimensions, there are segmentation faults when fitting. See below for example.

Setting dim_contiguous=(True, True) runs fine, but dim_contiguous=(False, True) leads to seg faults.

Python traceback:

Related issues or possible duplicates

  • None

Environment

OS: Ubuntu

Python version: 3.9.16

scikit-tree version: 0.6.1

Output of pip freeze:

Steps to reproduce

Example source:

import numpy as np
from sktree import PatchObliqueRandomForestClassifier

n, a, b = 100, 10, 10

x = np.random.normal(size=(n, a, b))
y = np.random.binomial(1, 0.5, size=(n))

rf = PatchObliqueRandomForestClassifier(
    n_estimators=100,
    min_patch_dims=[1, 1],
    max_patch_dims=[4, 4],
    dim_contiguous=(False, True),
    data_dims=(a, b),
    n_jobs=-1,
)

rf.fit(x.reshape(100, -1), y)

j1c avatar Feb 07 '24 07:02 j1c