scikit-tree icon indicating copy to clipboard operation
scikit-tree copied to clipboard

Sparse Implementation of `build_coleman_forest` plus a fix for bottleneck import

Open ryanhausen opened this issue 1 year ago • 9 comments
trafficstars

Reference Issues/PRs

Fixes #310

What does this implement/fix? Explain your changes.

This adds a sparse implementation of build_coleman_forest to treeple.stats. The sparse implementation relies on scipy.stats.sparse and so only works in the case of binary classification and regression. In my tests using cProfile/memray, the sparse implementation is a little over 50% faster and uses about %7 less memory, see pdf for more information. I am also attaching the profiling data if you'd like to take a look.

This PR also includres a change to the bottleneck implementation to for the dense implementation of build_coleman_forest. Specifically there are two important changes:

  1. The check for the existence of the bottleneck package is now done using importlib.util.find_spec("bottleneck"). The old check using sys seemed to work fine and passed tests, but actually didn't seem work unless bottleneck had been imported before, which when the tests are run is true, but may not be the case in other situations
  2. the aliased and lambda functions nanmean_f and anynan_f are now defined using def so that the warning message can moved into their respective calls rather than at import.

Any other comments?

The scipy.stats sparse implementation should be swapped out for pydata.sparse when that implementation is performant enough, as it is more general.

ryanhausen avatar Aug 20 '24 19:08 ryanhausen

Codecov Report

Attention: Patch coverage is 89.79592% with 10 lines in your changes missing coverage. Please review.

Project coverage is 80.27%. Comparing base (1d970b0) to head (d733508). Report is 3 commits behind head on main.

Files Patch % Lines
treeple/stats/forest.py 81.39% 5 Missing and 3 partials :warning:
treeple/stats/utils.py 96.36% 0 Missing and 2 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #317      +/-   ##
==========================================
+ Coverage   80.00%   80.27%   +0.26%     
==========================================
  Files          24       24              
  Lines        2221     2296      +75     
  Branches      411      422      +11     
==========================================
+ Hits         1777     1843      +66     
- Misses        312      317       +5     
- Partials      132      136       +4     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 20 '24 20:08 codecov[bot]

@adam2392 I have updated the PR, let me know if I missed anything or if you want to see some more changes.

ryanhausen avatar Aug 22 '24 14:08 ryanhausen

@adam2392 I added some documentation to address your comments. Let me know what you think.

ryanhausen avatar Aug 26 '24 15:08 ryanhausen

I will let @sampan501 @SUKI-O @YuxinB @PSSF23 take a look as well.

adam2392 avatar Aug 26 '24 15:08 adam2392

@sampan501 @adam2392 Hey I added a test, but it looks like I some other tests are failing. I submitted an issue, #328. I'll start looking into it, but it might go faster if you have an intuition on what the issue is.

ryanhausen avatar Sep 23 '24 21:09 ryanhausen

This is gonna be an issue due to parameter validation in scikit-learn. I will push a fix when I have time. It seems the rest of the code works tho.

I would prioritize other issues until I have the time to push up the fix

adam2392 avatar Sep 23 '24 22:09 adam2392

@adam2392 sounds good. Should we let this sit and update and merge after you have a fix for the other issue or merge as is? @sampan501

ryanhausen avatar Sep 23 '24 23:09 ryanhausen

I'm happy to merge if @adam2392 is cool with it

sampan501 avatar Sep 23 '24 23:09 sampan501

@adam2392 sounds good. Should we let this sit and update and merge after you have a fix for the other issue or merge as is? @sampan501

I will merge once the fix is added and the CI is confirmed to be happy. No rush on this one.

adam2392 avatar Sep 24 '24 00:09 adam2392

@ryanhausen if you update wrt main, the CIs should be fixed

adam2392 avatar Oct 10 '24 21:10 adam2392

Thanks @ryanhausen !

adam2392 avatar Oct 11 '24 12:10 adam2392