MAPIE icon indicating copy to clipboard operation
MAPIE copied to clipboard

MapieQuantileRegressor with prefit model from Keras/Tensorflow

Open dani-vu opened this issue 1 year ago • 1 comments

I want to apply CQR with a customized LSTM model created with Tensorflow. However, it does not support Tensorflow models. Is there a workaround or am I missing something?

Thanks!

dani-vu avatar May 24 '24 10:05 dani-vu

Hey @dani-vu, Thank you for the issue. I believe that if you use the cv="prefit" you should be able to use MapieQuantileRegressor by simply packaging your models as in the issue #340. Note that you need to fit all three models and provide them as follows:

    estimators_: List[RegressorMixin]
        - [0]: Estimator with quantile value of alpha/2
        - [1]: Estimator with quantile value of 1 - alpha/2
        - [2]: Estimator with quantile value of 0.5

Don't hesitate if you have any other question!

LacombeLouis avatar May 24 '24 17:05 LacombeLouis

Hello,

We’re closing this issue due to inactivity, as we haven’t received a response in over a month. If you still need assistance or have more information to provide, please feel free to reopen the issue or create a new one.

Thank you!

jawadhussein462 avatar Nov 07 '24 16:11 jawadhussein462

Hi!

I reopen this issue as I am dealing with the same problem for a simple pre-trained Keras regression model.

I am not quite clear what those three estimators consist of and whether they would require retraining my model.

Please, could you kindly provide me with some guidelines on how to use MapieQuantileRegressor with a pre-trained Keras model? I haven't found much more information anywhere.

This is an example script I've developed for the California Housing dataset:

import pandas as pd

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.optimizers import Adam

from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

from mapie.regression import MapieQuantileRegressor

################## PREPARE DATA ##################

data = fetch_california_housing()
X, y = data.data, data.target

scaler = StandardScaler()
X = scaler.fit_transform(X)

X_train, X_test_cal, y_train, y_test_cal = train_test_split(X, y, test_size=0.3, random_state=42)
X_test, X_cal, y_test, y_cal = train_test_split(X_test_cal, y_test_cal, test_size=0.5, random_state=42)

print('Train: ', len(X_train))
print('Test: ', len(X_test))
print('Calibration: ', len(X_cal))

######################## TRAIN AND SAVE MODEL ########################

nn_model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1)
])

nn_model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')

nn_model.fit(X_train, y_train, epochs=20, batch_size=32,
             validation_split=0.2, verbose=0)

nn_model.save('model.keras')

####################### LOAD AND WRAP MODEL ########################

class TrainedKerasRegressorWrapper(BaseEstimator, RegressorMixin):
    def __init__(self, model):
        self.model = model

    def fit(self, X, y):
        return self

    def predict(self, X):
        return self.model.predict(X).flatten()

    def __sklearn_is_fitted__(self):
        return True


loaded_model = load_model('model.keras')

model = TrainedKerasRegressorWrapper(loaded_model)

######################## QUANTILE REGRESSION #######################

model_list = [model_1, model_2, model_3]  # <-- How can I get this models?

mapie_regressor = MapieQuantileRegressor(
    estimator=model_list, cv='prefit')

mapie_regressor.fit(X_cal, y_cal)

predictions, intervals = mapie_regressor.predict(X_test)

lower_intervals = intervals[:, 0]
upper_intervals = intervals[:, 1]

results = pd.DataFrame({
    'Prediction': predictions.flatten(),
    'Lower Interval': lower_intervals.flatten(),
    'Upper Interval': upper_intervals.flatten(),
    'Amplitude': upper_intervals.flatten() - lower_intervals.flatten(),
    'Actual Value': y_test
})

results.head()

Thank you! :-)

manjavacas avatar Nov 19 '24 09:11 manjavacas

Hello @manjavacas.

Let's say you set alpha = 0.1. The MapieQuantileRegressor uses 3 models:

  • the usual model that is used to predict y_true given X
  • a model that predicts the lower bound of the intervals (0.1/2 = 5% quantile)
  • a model that predicts the upper bound of the intervals ((1-0.1/2) = 95% quantile)

This way, you hope that y_true will fall 1-0.1 = 90% of the time (95%-5%) between the interval bounds.

To get those last 2 models, you need to fit them using the pinball loss, a loss that takes a parameter tau:

  • the first one with tau = 0.1/2 = 0.05 = 5%
  • the second one with tau = (1-0.1/2) = 0.95 = 95%

To understand how to create a pinball loss, you can check this link for example: https://stackoverflow.com/questions/43151694/define-pinball-loss-function-in-keras-with-tensorflow-backend

Let me know if you need more information.

Valentin-Laurent avatar Nov 19 '24 14:11 Valentin-Laurent

Thank you very much for your reply @Valentin-Laurent!

I think I've managed to implement it successfully :-)

predictions

Now another question has come to me: is it advisable that the models used to predict the quantiles have the same architecture as those used to make the actual predictions? (i.e., let's suppose I can't pre-train my model but I can train a proper model for quantile estimation)

Thanks again!

PD. For anyone interested:

def pinball_loss(y_true, y_pred, tau=.5):
    error = y_true - y_pred
    return tf.reduce_mean(tf.maximum(tau * error, (tau - 1) * error))

def train_and_save_model(loss_fn, file_name):
    model = Sequential([
        Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
        Dense(32, activation='relu'),
        Dense(1)
    ])
    model.compile(optimizer='adam', loss=loss_fn)
    model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=.2, verbose=0)
    model.save(file_name)
    return model

alpha = .1

model_list = [
     train_and_save_model(lambda y_true, y_pred: pinball_loss(y_true, y_pred, tau=(1-alpha)/2), 'model_up.keras'),
     train_and_save_model(lambda y_true, y_pred: pinball_loss(y_true, y_pred, tau=(alpha/2)), 'model_low.keras'),  
     train_and_save_model('mse', 'model.keras')
]

model_files = ['model_low.keras', 'model_up.keras', 'model.keras']
wrapped_models = []

for file in model_files:
    loaded_model = load_model(file, compile=False)
    wrapped_model = TrainedKerasRegressorWrapper(loaded_model)
    wrapped_models.append(wrapped_model)

mapie_regressor = MapieQuantileRegressor(
    estimator=wrapped_models, cv='prefit')

# ... (MAPIE regressor predictions)

manjavacas avatar Nov 20 '24 10:11 manjavacas

Hello @manjavacas, I'm glad you managed to implement it successfully :)

To answer your follow-up question: there is no need for the quantiles models to have the same architecture as your pretrained model. In my opinion, ultimately, the better your models are able to predict quantiles, the better your intervals will be (in terms of adaptativity and width).

Let's ask @vincentblot28 or @thibaultcordier to confirm.

Valentin-Laurent avatar Nov 20 '24 10:11 Valentin-Laurent

Hello @manjavacas, I'm glad you managed to implement it successfully :)

To answer your follow-up question: there is no need for the quantiles models to have the same architecture as your pretrained model. In my opinion, ultimately, the better your models are able to predict quantiles, the better your intervals will be (in terms of adaptativity and width).

Let's ask @vincentblot28 or @thibaultcordier to confirm.

Yep, I suppose that is not a disadvantage, quite the opposite.

On the other hand, I understand that if my 'real' model fits the target well (average value, close to 0.5 quantile), the same architecture will work well for predicting other quantiles...

Thanks 👍🏻

manjavacas avatar Nov 20 '24 10:11 manjavacas

Hello @manjavacas, indeed, at the end of the day, the better your model, the better your prediction intervals. However, you should keep in mind that conformal predictions estimate the uncertainty of your model (the one you use to make point predictions).

The case of quantile regression is a little different as the idea is to take 2 quantile regressor to give you a first "insight" of the size of your prediction intervals, then you add a layer of conformal predictions to give coverage guarantees.

In this case your point prediction model can be very different from your quantile regressions, however the size of your prediction interval won't necessarily relate to the uncertainty of your point predictor (you prediction may even be outside of your interval in some extreme cases).

Conclusion: if you're only interested in the prediction intervals you can totally have two different model architectures, however, if you want to quantify the uncertainty of your predictive model, then it is advisable to have the same architecture

vincentblot28 avatar Nov 20 '24 10:11 vincentblot28

Hello @manjavacas, indeed, at the end of the day, the better your model, the better your prediction intervals. However, you should keep in mind that conformal predictions estimate the uncertainty of your model (the one you use to make point predictions).

The case of quantile regression is a little different as the idea is to take 2 quantile regressor to give you a first "insight" of the size of your prediction intervals, then you add a layer of conformal predictions to give coverage guarantees.

In this case your point prediction model can be very different from your quantile regressions, however the size of your prediction interval won't necessarily relate to the uncertainty of your point predictor (you prediction may even be outside of your interval in some extreme cases).

Conclusion: if you're only interested in the prediction intervals you can totally have two different model architectures, however, if you want to quantify the uncertainty of your predictive model, then it is advisable to have the same architecture

Perfect, it's clear to me and now I understand the differences. Thanks!

All solved on my side ✅

manjavacas avatar Nov 20 '24 11:11 manjavacas