parsnip icon indicating copy to clipboard operation
parsnip copied to clipboard

model vs engine

Open mmp3 opened this issue 3 years ago • 4 comments

I am developing a package which implements extensions to parsnip.

I am trying to understand the parsnip design principle of a model versus an engine so that I can implement the extensions without incurring redundancy or confusion.

I have read the Details section of the documentation pages for parsnip models:

"This function only defines what type of model is being fit. Once an engine is specified, the method to fit the model is also defined."

and I am following the excellent parsnip guide for building new models, but the boundary between model and engine is not yet completely clear for me.

Here are the two particular cases on my mind:

  1. Logistic regression with a specific linear equality constraint on the regression coefficients (and with L1 and/or L2 regularization). Should this be a new engine to logistic_reg, or a new model entirely? It would have the exact same main arguments as logistic_reg, so I am leaning towards "engine", but the constraint makes the fitted model fundamentally different from the fitted model that can be obtained with any other engine for logistic_reg. The constraint also induces a special interpretation of the fitted regression coefficients which is useful in its own right which the user would want to know about, for what it's worth.

  2. k-nearest neighbors with a user-defined distance function. Should this be a new engine to nearest_neighbor, or a new model entirely? It would require a new main argument for the model (e.g. dist_func) - is that too much for an engine? No other engine could produce this result because they do not accept arbitrary distance functions.

Expanding on these two specific cases, here are some overarching questions about the boundary between model and engine:

  1. Conceptually, are engines just different underlying implementations for achieving essentially the same range of fitted models? That is, should we be able to take any two engines for the same model, and with enough parameter tweaking, obtain essentially the same fitted model from each?

  2. If a candidate engine for a model would be able to return a fitted model that is fundamentally different from the fitted models returned from the other engines for the same model - not just a better fit, but fundamentally different in that other fits do not approximate it (e.g. constrained regression coefficients), - then should that candidate engine in fact be a new standalone model?

  3. If a candidate model is very similar to an existing model in parsnip, does having different "main arguments" mean that the candidate model should be implemented as a new standalone model in parsnip rather than as a new engine for the extant model?

mmp3 avatar Jul 22 '21 16:07 mmp3

Think of the model type as the structure of the model/prediction equation. So logistic_reg() captures models that have a linear predictor that models some monotonic transformation of a binomial/Bernoulli probability.

The engine can be conceptualized by the estimation method. So logistic_reg(engine = "glm") means basic maximum likelihood, logistic_reg(engine = "stan") means Bayesian estimation etc.

I would encourage you to re-use as many parsnip model functions as you can and add new engines. I think that the main parameters are frozen or, at least, we'd add a new argument if it is a commonly used parameter that we've missed.

The good thing is that we can automate the usage and tuning of engine arguments pretty easily. For example, see this blog post.

I suggest working on a parsnip-adjacent package and ask for review before you finalize it. You certainly don't need our approval but there have been a number of parsnip-adjacent packages lately that work but have interfaces that are fundamentally opposite of what we are trying to do with tidymodels and the tidyverse.

Some specific answers to your questions:

should we be able to take any two engines for the same model, and with enough parameter tweaking, obtain essentially the same fitted model from each?

No. They can be different estimators of the same structural equation

If a candidate engine for a model would be able to return a fitted model that is fundamentally different from the fitted models returned from the other engines for the same model ... then should that candidate engine in fact be a new standalone model?

No. The underlying model objects across engines are usually very different. That's encouraged. We want to make sure that the api/experience that users have is the same. Most of that is outlined here.

If a candidate model is very similar to an existing model in parsnip, does having different "main arguments" mean that the candidate model should be implemented as a new standalone model in parsnip rather than as a new engine for the extant model?

No. We'd encourage you to use the same model function (with the same main arguments) and use dials and a tunable() method to make it easy to use your engine with engine-specific arguments.

topepo avatar Jul 22 '21 21:07 topepo

Thank you for the helpful explanations, it is clearer now.

I suggest working on a parsnip-adjacent package and ask for review before you finalize it. You certainly don't need our approval but there have been a number of parsnip-adjacent packages lately that work but have interfaces that are fundamentally opposite of what we are trying to do with tidymodels and the tidyverse.

Sure, I would certainly appreciate feedback and approval. What is the preferred avenue for notifying the team and sharing the prospective package for feedback?

mmp3 avatar Jul 26 '21 19:07 mmp3

Put it in a GH repo and tag me when you have questions 👍

topepo avatar Jul 27 '21 18:07 topepo

Put it in a GH repo and tag me when you have questions 👍

OK, I invited you and @juliasilge to the repo and posted some questions as issues and tagged both of you in those issues.

mmp3 avatar Aug 03 '21 15:08 mmp3

Glad we have this discussion documented publicly! As it's been a couple years since follow-up here, I'll go ahead and close.

simonpcouch avatar Mar 09 '23 15:03 simonpcouch

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

github-actions[bot] avatar Mar 25 '23 00:03 github-actions[bot]