scikit-learn
scikit-learn copied to clipboard
RFC make response / inverse link / activation function official
With questions like #29163 and with the private loss functions #15123 (almost everywhere) in place, I would like to discuss to make the inverse link function public.
Models like LogisticRegression or HistGradientBoostingRegressor(loss="poisson") have predictions like inverse_link(raw_prediction(X)) where raw_prediction(X) is the prediction in "link space", e.g. linear predictor ("eta") for linear models.
In line with the most recent nomenclature of HistGradientBoosting*, I propose the following public API for regressors and classifiers:
raw_predict(X)response_function(y_raw)oractivation_function(y_raw)link_function(y_obs)
Alternatives:
estimator.linkis a link object which has 2 methods named like above.- 1-to-1 with the actual implementation:
estimator.loss.linkand then as alternative 1. This would also expose the loss function (object), see also #28169.
Further considerations
- This would also make easier/solve #18309
- Does this necessitate a SLEP?
@scikit-learn/communication-team @scikit-learn/contributor-experience-team @scikit-learn/core-devs @scikit-learn/documentation-team ping
isn't raw_predictions the same as decision_function in effect?
activation_function sounds NN like, so maybe not appropriate.
Overall, I don't think I like the idea of adding these at the estimator level. I personally don't even understand what link_function or response_function would do. However, I'd be happy to have estimator.link documented properly, and then have methods of estimator.link as public methods for people who really want to dig in.
The link function is the inverse of what you usually apply in reality:
- Binary Logistic:
return of predict_proba = expit(raw)and link function islogit - (Log-)-Poisson regression:
return of predict = exp(raw)and link function islog
isn't raw_predictions the same as decision_function in effect?
For classifiers, in effect, yes. But not API-wise. Also, decision_function is one of the greates misnomers of our API, IMO. The existance of decision_function is also not guaranteed from what I read in our docs:
Classification algorithms usually also offer a way to quantify certainty of a prediction, either using decision_function or predict_proba
Site note: I guess that for the softmax inverse link used for the multiclass log-loss, we would expose softmax as inverse link and log as (forward) link since softmax is not bijective.
Main comment to the core of the RFC:
I think would intuitively favor making estimator.loss.link public instead of introducing new methods on the estimator itself, but we would still need to expose a raw_predict method on the estimator to make that useful.
I agree that decision_function is a really bad name in retrospect, especially now that we started introducing cost-sensitive learning tools such as TunedThresholdClassifierCV where the "decision" would better represent the thresholded predict_proba of a binary classifer for instance.
So we could introduce raw_predict (at least for classifiers, regressors, not sure about clustering algorithms).
Then we could soft deprecate (e.g. stop using and hide in the doc any occurrence of decision_function in favor of raw_predict for classifiers, but keep the decision_function as a pure backward compat alias to avoid breaking user code).
We would definitely a SLEP for such an impacting public API change though.
Note that the question in #29163 should first be addressed by updating our user guide (and maybe cross-reference it from the docstring for the predict (for regressors) and predict_proba (for classifiers) methods.
I think raw_predict makes sense.
I'm unsure of adding expose the loss objects through estimator.loss. Can the link functions be a regular functions and in the docstring of raw_predict we state how to generate predict or predict_proba?
In principle, loss and link function are independent. It's only an efficient implementation that needs to merge them together. As written in the RFC, having link and inverse link function available on the estimator would help several use cases and reduce confusion.
I think adding three new methods estimators is pretty big API update. I'm trying to find a path to get some momentum, like standardizing on raw_predict.
After raw_predict is out, then there is a standard way to do this:
log_reg = LogisticRegression().fit(X, y)
raw_pred = log_reg.raw_predict(X)
# If they want to get the actual predctions, they need to know the inverse link:
y_pred = expit(raw_pred)
I think this is already a net win.
Afterwards, if we want to go further, we can extend the API:
y_pred = log_reg.link.inverse(raw_pred)
raw_again = log_reg.link(y_pred)
But raw_predict without something like response_function/inverse is quite meaningless. Maybe, we don't need the link function itself, only it's inverse.
EDIT: Ok, starting with only raw_predict makes sense and already enable new things.
I think adding three new methods estimators is pretty big API update.
Yes, it is. Again "but" those are missing from the beginnings as they are quite fundamental (for any loss except the squared error and quantile loss).
In the short term, I'm thinking of having the inverse link function in the docstring, so the raw_predict is meaningful:
def raw_predict(self, X):
"""Return the raw prediction.
`predict`'s output is the same as `inverse_link(raw_prediction)`, where
the inverse link function is `scipy.special.expit`.
"""
For estimators that can adjust their loss, we'll list out the inverse link functions.
Another use case for raw_predict is #28574 which needs access to logits.
I think we would need to clarify:
- if we want to expose
raw_predictfor all supervised estimators in scikit-learn or only for models with an explicit link function (GLMs, MLPs, Gradient Boosting...); - how it is defined for non-probabilistic classifiers such as SVC (and variants): is it always mapped to
decision_functionfor a classifier?
We discussed it in the monthly meeting https://github.com/scikit-learn/administrative/blob/master/monthly_meetings/2025-02-24.md and agreed on having a raw_predict, officially.
And yes, for non-probabilistic classifiers, it returns the same as decision_function. We also discussed deprecating decision_function (with a long cycle). No hard objections were mentioned. I would be +1 for that.