smartcore icon indicating copy to clipboard operation
smartcore copied to clipboard

refactor: do not require an empty struct in cross_validate to define …

Open morenol opened this issue 2 years ago • 6 comments

…which is the estimator

Fixes https://github.com/smartcorelib/smartcore/issues/230

Checklist

  • [ ] My branch is up-to-date with development branch.
  • [ ] Everything works and tested on latest stable Rust.
  • [ ] Coverage and Linting have been applied

Current behaviour

New expected behaviour

Change logs

morenol avatar Nov 04 '22 20:11 morenol

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

morenol avatar Nov 04 '22 21:11 morenol

@morenol from a programming perspective I agree, but writing ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _> is really not viable for an interface that has to face an end-user. Think if you had to write that to issue a request using Python requests: requests.request::<_,_,_,_, POST,_,_,_>(...) ?!!? would you ever use something like that?

If you want to remove the new there are these options imo:

  • passing the single fit function using Fn or FnOnce but it is hard as every estimator has different signatures
  • passing the string name "LogisticRegression" and instantiate an object inside fit using an enumerator (as proposed for Kernel in #221) or an enumerator key like EstimatorEnum::LogisticRegression
  • refactoring cross_validate in a way that the user has to write only cross_validate::<LogisticRegression>(...)
  • anything else I don't know about

I used new because there are some other constraints in place and the alternatives are not straightforward. As we have as a main point to keep an easy "pythonic" API we cannot allow Rust's types to reach the end-user's level.

PS: I don't like how cross_validate and cross_validate_predict are implemented, so maybe we should rewrite them in a better way. This could also lead to a better definition of the SupervisedEstimator and Predictor traits. For an example see fit and predict.

Mec-iS avatar Nov 04 '22 21:11 Mec-iS

Codecov Report

Merging #231 (e95798d) into development (aab3817) will increase coverage by 0.11%. The diff coverage is n/a.

@@               Coverage Diff               @@
##           development     #231      +/-   ##
===============================================
+ Coverage        43.93%   44.05%   +0.11%     
===============================================
  Files               85       85              
  Lines             7290     7268      -22     
===============================================
- Hits              3203     3202       -1     
+ Misses            4087     4066      -21     
Impacted Files Coverage Δ
src/ensemble/random_forest_classifier.rs 53.54% <ø> (+0.34%) :arrow_up:
src/ensemble/random_forest_regressor.rs 57.37% <ø> (+0.46%) :arrow_up:
src/linear/elastic_net.rs 50.00% <ø> (+0.39%) :arrow_up:
src/linear/lasso.rs 50.00% <ø> (+0.45%) :arrow_up:
src/linear/linear_regression.rs 35.41% <ø> (+1.41%) :arrow_up:
src/linear/logistic_regression.rs 38.14% <ø> (+0.19%) :arrow_up:
src/linear/ridge_regression.rs 40.17% <ø> (+0.34%) :arrow_up:
src/model_selection/mod.rs 72.22% <ø> (ø)
src/naive_bayes/bernoulli.rs 41.56% <ø> (+0.84%) :arrow_up:
src/naive_bayes/categorical.rs 34.50% <ø> (-1.16%) :arrow_down:
... and 14 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Nov 05 '22 00:11 codecov-commenter

@morenol from a programming perspective I agree, but writing ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _> is really not viable for an interface that has to face an end-user. Think if you had to write that to issue a request using Python requests: requests.request::<_,_,_,_, POST,_,_,_>(...) ?!!? would you ever use something like that?

If you want to remove the new there are these options imo:

  • passing the single fit function using Fn or FnOnce but it is hard as every estimator has different signatures
  • passing the string name "LogisticRegression" and instantiate an object inside fit using an enumerator (as proposed for Kernel in Deserialization for Kernel #221) or an enumerator key like EstimatorEnum::LogisticRegression
  • refactoring cross_validate in a way that the user has to write only cross_validate::<LogisticRegression>(...)
  • anything else I don't know about

I used new because there are some other constraints in place and the alternatives are not straightforward. As we have as a main point to keep an easy "pythonic" API we cannot allow Rust's types to reach the end-user's level.

PS: I don't like how cross_validate and cross_validate_predict are implemented, so maybe we should rewrite them in a better way. This could also lead to a better definition of the SupervisedEstimator and Predictor traits. For an example see fit and predict.

I agree, I am not fan of having an API were something like this is required ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _>. I will keep this open until I find a better way

morenol avatar Nov 05 '22 15:11 morenol

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

Sammyreal1 avatar May 07 '23 00:05 Sammyreal1

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

see discussion at https://github.com/smartcorelib/smartcore/issues/230

Mec-iS avatar May 07 '23 15:05 Mec-iS