[Feature] Add Random Forest Classifier to linfa-trees
š Description
I would like to contribute a new module to the linfa-trees crate that implements the Random Forest algorithm for classification tasks. This will expand linfa-trees from single decision trees into ensemble learning, aligning closely with scikit-learn's functionality in Python.
š Motivation
Random Forests are a powerful ensemble learning method used widely in classification tasks. They provide:
-
Robustness to overfitting
-
Better generalization than single trees
-
Feature importance estimates
Currently, linfa-trees provides support for single decision trees. By adding Random Forests, we unlock ensemble learning for the Rust ML ecosystem.
š Proposed Design
š¹ New Module
A new file will be added:
This will include:
-
RandomForestClassifier<F: Float> -
RandomForestParams<F>(unchecked) -
RandomForestValidParams<F>(checked)
š¹ Trait Implementations
I will implement the following traits according to linfa conventions:
-
ParamGuardfor parameter validation -
Fitto train the forest using bootstrapped data and random feature subsetting -
PredictInplaceandPredictto perform inference via majority voting
š¹ Example
An example will be added in:
Using the Iris dataset from linfa-datasets.
š¹ Benchmark (Optional)
If approved, I can also add a benchmark using Criterion:
š File Integration Plan
-
src/lib.rs: Re-exportrandom_forest::* -
src/decision_trees/mod.rs:pub mod random_forest; -
README.md: Update with a section on Random Forests and example usage -
examples/iris_random_forest.rs: Demonstrates training and evaluation
š¦ API Preview
ā Conformity with CONTRIBUTING.md
-
Uses
Floattrait forf32/f64compatibility -
Follows the
ParamsāValidParamsvalidation pattern -
Implements
Fit,Predict, andPredictInplaceusingDataset -
Optional
serdesupport via feature flag -
Will include unit tests and optionally benchmarks
šāāļø Request
Please let me know if you're open to this contribution. Iād be happy to align with maintainers on:
-
Feature scope (classifier first, regressor later?)
-
Benchmarking standards
-
Integration strategy (e.g., reuse of
DecisionTree)
Looking forward to your guidance!
Thanks for your thorough description. This looks good to me, please proceed with a PR!
Sorry, just noticed previous art in #229. It would be great to take a look at it before jumping on a whole new implementation.
@relf Sure i will look into #229 and afterwards i will prepare my PR
@relf
I have done a PR, please see .. all checks have been passed and i have successfully tested the module.
PR link - https://github.com/rust-ml/linfa/pull/390