scikit-learn icon indicating copy to clipboard operation
scikit-learn copied to clipboard

RFC make response / inverse link / activation function official

Open lorentzenchr opened this issue 1 year ago • 8 comments

With questions like #29163 and with the private loss functions #15123 (almost everywhere) in place, I would like to discuss to make the inverse link function public.

Models like LogisticRegression or HistGradientBoostingRegressor(loss="poisson") have predictions like inverse_link(raw_prediction(X)) where raw_prediction(X) is the prediction in "link space", e.g. linear predictor ("eta") for linear models.

In line with the most recent nomenclature of HistGradientBoosting*, I propose the following public API for regressors and classifiers:

  • raw_predict(X)
  • response_function(y_raw) or activation_function(y_raw)
  • link_function(y_obs)

Alternatives:

  1. estimator.link is a link object which has 2 methods named like above.
  2. 1-to-1 with the actual implementation: estimator.loss.link and then as alternative 1. This would also expose the loss function (object), see also #28169.

Further considerations

  • This would also make easier/solve #18309
  • Does this necessitate a SLEP?

@scikit-learn/communication-team @scikit-learn/contributor-experience-team @scikit-learn/core-devs @scikit-learn/documentation-team ping

lorentzenchr avatar Jun 03 '24 12:06 lorentzenchr

isn't raw_predictions the same as decision_function in effect?

activation_function sounds NN like, so maybe not appropriate.

Overall, I don't think I like the idea of adding these at the estimator level. I personally don't even understand what link_function or response_function would do. However, I'd be happy to have estimator.link documented properly, and then have methods of estimator.link as public methods for people who really want to dig in.

adrinjalali avatar Jun 04 '24 13:06 adrinjalali

The link function is the inverse of what you usually apply in reality:

  • Binary Logistic: return of predict_proba = expit(raw) and link function is logit
  • (Log-)-Poisson regression: return of predict = exp(raw) and link function is log

isn't raw_predictions the same as decision_function in effect?

For classifiers, in effect, yes. But not API-wise. Also, decision_function is one of the greates misnomers of our API, IMO. The existance of decision_function is also not guaranteed from what I read in our docs:

Classification algorithms usually also offer a way to quantify certainty of a prediction, either using decision_function or predict_proba

lorentzenchr avatar Jun 04 '24 14:06 lorentzenchr

Site note: I guess that for the softmax inverse link used for the multiclass log-loss, we would expose softmax as inverse link and log as (forward) link since softmax is not bijective.

Main comment to the core of the RFC:

I think would intuitively favor making estimator.loss.link public instead of introducing new methods on the estimator itself, but we would still need to expose a raw_predict method on the estimator to make that useful.

I agree that decision_function is a really bad name in retrospect, especially now that we started introducing cost-sensitive learning tools such as TunedThresholdClassifierCV where the "decision" would better represent the thresholded predict_proba of a binary classifer for instance.

So we could introduce raw_predict (at least for classifiers, regressors, not sure about clustering algorithms).

Then we could soft deprecate (e.g. stop using and hide in the doc any occurrence of decision_function in favor of raw_predict for classifiers, but keep the decision_function as a pure backward compat alias to avoid breaking user code).

We would definitely a SLEP for such an impacting public API change though.

ogrisel avatar Jun 11 '24 14:06 ogrisel

Note that the question in #29163 should first be addressed by updating our user guide (and maybe cross-reference it from the docstring for the predict (for regressors) and predict_proba (for classifiers) methods.

ogrisel avatar Jun 11 '24 14:06 ogrisel

I think raw_predict makes sense.

I'm unsure of adding expose the loss objects through estimator.loss. Can the link functions be a regular functions and in the docstring of raw_predict we state how to generate predict or predict_proba?

thomasjpfan avatar Jun 14 '24 03:06 thomasjpfan

In principle, loss and link function are independent. It's only an efficient implementation that needs to merge them together. As written in the RFC, having link and inverse link function available on the estimator would help several use cases and reduce confusion.

lorentzenchr avatar Sep 28 '24 12:09 lorentzenchr

I think adding three new methods estimators is pretty big API update. I'm trying to find a path to get some momentum, like standardizing on raw_predict.

After raw_predict is out, then there is a standard way to do this:

log_reg = LogisticRegression().fit(X, y)
raw_pred = log_reg.raw_predict(X)

# If they want to get the actual predctions, they need to know the inverse link:
y_pred = expit(raw_pred)

I think this is already a net win.

Afterwards, if we want to go further, we can extend the API:

y_pred = log_reg.link.inverse(raw_pred)

raw_again = log_reg.link(y_pred)

thomasjpfan avatar Sep 28 '24 18:09 thomasjpfan

But raw_predict without something like response_function/inverse is quite meaningless. Maybe, we don't need the link function itself, only it's inverse.

EDIT: Ok, starting with only raw_predict makes sense and already enable new things.

I think adding three new methods estimators is pretty big API update.

Yes, it is. Again "but" those are missing from the beginnings as they are quite fundamental (for any loss except the squared error and quantile loss).

lorentzenchr avatar Sep 28 '24 21:09 lorentzenchr

In the short term, I'm thinking of having the inverse link function in the docstring, so the raw_predict is meaningful:

def raw_predict(self, X):
    """Return the raw prediction.

    `predict`'s output is the same as `inverse_link(raw_prediction)`, where 
    the inverse link function is `scipy.special.expit`.
    """

For estimators that can adjust their loss, we'll list out the inverse link functions.

thomasjpfan avatar Oct 09 '24 19:10 thomasjpfan

Another use case for raw_predict is #28574 which needs access to logits.

lorentzenchr avatar Feb 18 '25 19:02 lorentzenchr

I think we would need to clarify:

  • if we want to expose raw_predict for all supervised estimators in scikit-learn or only for models with an explicit link function (GLMs, MLPs, Gradient Boosting...);
  • how it is defined for non-probabilistic classifiers such as SVC (and variants): is it always mapped to decision_function for a classifier?

ogrisel avatar Mar 06 '25 08:03 ogrisel

We discussed it in the monthly meeting https://github.com/scikit-learn/administrative/blob/master/monthly_meetings/2025-02-24.md and agreed on having a raw_predict, officially.

And yes, for non-probabilistic classifiers, it returns the same as decision_function. We also discussed deprecating decision_function (with a long cycle). No hard objections were mentioned. I would be +1 for that.

lorentzenchr avatar Mar 06 '25 09:03 lorentzenchr