scikit-lego icon indicating copy to clipboard operation
scikit-lego copied to clipboard

[FEATURE] Add `LinearEmbedder` model to the mix.

Open koaning opened this issue 6 months ago • 1 comments

In this live stream I seem to be able to show that KNN can perform on par with boosted models once you improve the representation of X. A quick trick to do this is to first run a linear model and to use the trained coefficients to rescale X before indexing it.

This implementation shows the idea:

from sklearn.base import BaseEstimator, RegressorMixin

class RidgeKNNRegressor(BaseEstimator, RegressorMixin):
    def __init__(self, n_neighbors=5, coef_fit=False, weights="uniform"):
        self.n_neighbors = n_neighbors
        self.coef_fit = coef_fit
        self.weights = weights

    def fit(self, X, y):
        if self.coef_fit:
            self.mod_ = Ridge(fit_intercept=False).fit(X, y)
            X = X * mod.coef_
        self.knn_ = KNeighborsRegressor(n_neighbors=self.n_neighbors, weights=self.weights).fit(X, y)
        return self

    def predict(self, X):
        if self.coef_fit:
            X = X * mod.coef_
        return self.knn_.predict(X)

Instead of doing both the embedding and the KNN in one go though I think it will be nicer to split this and to have a dedicated (meta?) estimator that can add context to X.

@FBruzzesi I can pick this up, but feel free to tell me if you doubt this idea.

koaning avatar Aug 14 '24 14:08 koaning