parsnip icon indicating copy to clipboard operation
parsnip copied to clipboard

`set_engine_args` ?

Open mmp3 opened this issue 3 years ago • 5 comments

I am constructing a parsnip-adjacent package that implements a new parsnip model with several engines.

The algorithms underlying each engine for the common new model are quite heterogeneous, to the point that there is no single parameter that is conceptually the same across all the engines, and the number of conceptually similar arguments between any pair of engines is often zero, or otherwise quite small. Yet, they all aim to estimate coefficients for the same equation, so it seems they should be separate engines for one model rather than separate models with one engine each.

The only way forward that I can see is that the main "model" function has no "main arguments", and then all arguments are engine-specific. The downside is that none of the engine-specific arguments can benefit from a constructor from dials in the way that set_model_args takes argument func that can refer to a function based on e.g. dials::new_qual_param. This will make tuning the engine-specific arguments less smooth because the user won't be able to use dials for constructing tuning grids for the engine-specific arguments.

Is there an analog to set_model_args that I am missing - like a set_engine_args?

mmp3 avatar Jul 27 '21 17:07 mmp3

No, that is a main difference between how model and engine arguments are handled.

Do you have a repo you can share for us to take a look at or maybe two example models you are wrapping that you could point out for us to see some specifics of what you mean?

juliasilge avatar Jul 29 '21 14:07 juliasilge

The algorithms underlying each engine for the common new model are quite heterogeneous, to the point that there is no single parameter that is conceptually the same across all the engines, and the number of conceptually similar arguments between any pair of engines is often zero, or otherwise quite small.

I disagree.

A good counter-example is random forest. All three engines have the main arguments that exactly match to the same underlying model parameters.

A slightly less good example is boosted trees. That shares an argument with random_forest(), namely trees. For both models, this is the number of individual models in the ensemble. They are handled differently (one is fits models sequentially over trees and the other does not). From a function API point of view, they can be treated equally.

Similar examples:

  • learning rate in boosted trees and neural networks. Same gradient decent concept used in a different context.
  • nearest neighbors for KNN models and imputation.
  • number of components for PCA, PLS, and ICA models.

I know that there are some between-engine differences between parameters. penalty for glmnet uses a different penalty than the LiblineaR model, but both are doing penalized regression.

If there are places that you feel the main arguments are too different, let us know and we can document this better.

topepo avatar Jul 30 '21 16:07 topepo

Apologies, I may have misread your point. I thought you were referring to existing models.

Did you mean the models that you are specifically working on?

topepo avatar Jul 30 '21 16:07 topepo

@juliasilge

No, that is a main difference between how model and engine arguments are handled.

OK, thank you.

Do you have a repo you can share for us to take a look at or maybe two example models you are wrapping that you could point out for us to see some specifics of what you mean?

Yes, I just invited you and @topepo to the repo.

mmp3 avatar Aug 03 '21 15:08 mmp3

@topepo

Apologies, I may have misread your point. I thought you were referring to existing models.

Did you mean the models that you are specifically working on?

Yes, I was referring to the new models I am trying to implement. I have invited you and @juliasilge to the repo, as requested.

mmp3 avatar Aug 03 '21 15:08 mmp3

Looks like this was resolved privately, so I will go ahead and close.

For folks that come across this in the future, it is indeed possible to register engine arguments for built-in dials support!

simonpcouch avatar Jan 03 '23 15:01 simonpcouch

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

github-actions[bot] avatar Jan 18 '23 01:01 github-actions[bot]