parsnip
parsnip copied to clipboard
`set_engine_args` ?
I am constructing a parsnip
-adjacent package that implements a new parsnip
model with several engines.
The algorithms underlying each engine for the common new model are quite heterogeneous, to the point that there is no single parameter that is conceptually the same across all the engines, and the number of conceptually similar arguments between any pair of engines is often zero, or otherwise quite small. Yet, they all aim to estimate coefficients for the same equation, so it seems they should be separate engines for one model rather than separate models with one engine each.
The only way forward that I can see is that the main "model" function has no "main arguments", and then all arguments are engine-specific. The downside is that none of the engine-specific arguments can benefit from a constructor from dials
in the way that set_model_args
takes argument func
that can refer to a function based on e.g. dials::new_qual_param
. This will make tuning the engine-specific arguments less smooth because the user won't be able to use dials
for constructing tuning grids for the engine-specific arguments.
Is there an analog to set_model_args
that I am missing - like a set_engine_args
?
No, that is a main difference between how model and engine arguments are handled.
Do you have a repo you can share for us to take a look at or maybe two example models you are wrapping that you could point out for us to see some specifics of what you mean?
The algorithms underlying each engine for the common new model are quite heterogeneous, to the point that there is no single parameter that is conceptually the same across all the engines, and the number of conceptually similar arguments between any pair of engines is often zero, or otherwise quite small.
I disagree.
A good counter-example is random forest. All three engines have the main arguments that exactly match to the same underlying model parameters.
A slightly less good example is boosted trees. That shares an argument with random_forest()
, namely trees
. For both models, this is the number of individual models in the ensemble. They are handled differently (one is fits models sequentially over trees
and the other does not). From a function API point of view, they can be treated equally.
Similar examples:
- learning rate in boosted trees and neural networks. Same gradient decent concept used in a different context.
- nearest neighbors for KNN models and imputation.
- number of components for PCA, PLS, and ICA models.
I know that there are some between-engine differences between parameters. penalty
for glmnet uses a different penalty than the LiblineaR model, but both are doing penalized regression.
If there are places that you feel the main arguments are too different, let us know and we can document this better.
Apologies, I may have misread your point. I thought you were referring to existing models.
Did you mean the models that you are specifically working on?
@juliasilge
No, that is a main difference between how model and engine arguments are handled.
OK, thank you.
Do you have a repo you can share for us to take a look at or maybe two example models you are wrapping that you could point out for us to see some specifics of what you mean?
Yes, I just invited you and @topepo to the repo.
@topepo
Apologies, I may have misread your point. I thought you were referring to existing models.
Did you mean the models that you are specifically working on?
Yes, I was referring to the new models I am trying to implement. I have invited you and @juliasilge to the repo, as requested.
Looks like this was resolved privately, so I will go ahead and close.
For folks that come across this in the future, it is indeed possible to register engine arguments for built-in dials support!
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.