MAPIE icon indicating copy to clipboard operation
MAPIE copied to clipboard

Create a Split Conformal API that operates directly on labels, predictions arrays.

Open RudrakshTuwani opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. Currently, estimators are required to have scikit-learn API. This can be problematic when using some of the deep learning frameworks. While it is easy to create a wrapper, sometimes generating predictions itself can take some time or be run on different hardware (e.g. get predictions on GPU instance, save to disk, and then create conformal prediction sets on CPU instance). Additionally, given compute costs for deep learning models and the diminishing returns of a larger calibration set, a Split Conformal framework is often ideal for deep learning problems.

As an aside, it would also potentially be faster to prototype and release new methods for the Split Conformal framework.

Describe the solution you'd like Here's an example template:

class SplitConformal():
  def __init__(self):
    pass
  
  def fit(self, Y_calib, Y_calib_pred, conformity_score_fn):
    pass

 def predict(self, Y_test, Y_test_pred, inv_conformity_score_fn):
    pass

The above setup generalizes to both classification and regression through appropriate specification of conformity_score_fn and inv_conformity_score_fn. Furthermore, it is also easily to extendible to whatever score function a user comes up with (quantile based, uses both mean and std error estimate etc.).

Describe alternatives you've considered We can use cv="prefit" as part of MapieRegressor to do Split Conformal calibration. However, the problem remains that the predictions have to be generated within predict (no way to calibrate existing predictions).

RudrakshTuwani avatar Mar 30 '22 13:03 RudrakshTuwani

Hi @RudrakshTuwani , thanks for raising this issue ! I guess a simple solution to this problem would be to add an optional argument y in the predict method. If y is not None then y should be used as y_pred throughout predict. What do you think ?

By the way, sorry for not replying yet to your PR but we are currently really busy and we hope to find some to address your contributions within the next few weeks.

vtaquet avatar Apr 08 '22 14:04 vtaquet

Hey @vtaquet , thanks for getting back and no worries! :)

I guess that could work for predict but I can't think of a clean solution for fit. Also, the primary motivations behind this is to have a model API agnostic conformal method that can be easily extended to incorporate cutting edge conformal methods.

RudrakshTuwani avatar Apr 09 '22 02:04 RudrakshTuwani

Hi @RudrakshTuwani, This implementation needs quite a huge refactoring to our code. As a simple solution, you could use this wrapper :

class FakeModel:
    
    def __init__(self):
        self.pred_proba = None
        self.trained_ = True
        self.classes_ = np.array([0, 1])


    def fit(self, X, y):
        pass

    def predict_proba(self, X):
        
        return X
    
    def get_params(self, deep=True):
        return {}

    def predict(self, X):
       
        pred_proba = self.predict_proba(X)
        return np.argmax(pred_proba, axis=1)

    def __sklearn_is_fitted__(self):
        return True

where the input X is in fact your predictions that you computed earlier. Doing so, there is no need to call the "real" predictmethod of your model.

We will look for a cleaner solution in the long term. I close the issue for now (Closed as not planned), we will re-open it when integrating this change to our roadmap

vincentblot28 avatar Mar 02 '23 08:03 vincentblot28