Create a Split Conformal API that operates directly on labels, predictions arrays.
Is your feature request related to a problem? Please describe. Currently, estimators are required to have scikit-learn API. This can be problematic when using some of the deep learning frameworks. While it is easy to create a wrapper, sometimes generating predictions itself can take some time or be run on different hardware (e.g. get predictions on GPU instance, save to disk, and then create conformal prediction sets on CPU instance). Additionally, given compute costs for deep learning models and the diminishing returns of a larger calibration set, a Split Conformal framework is often ideal for deep learning problems.
As an aside, it would also potentially be faster to prototype and release new methods for the Split Conformal framework.
Describe the solution you'd like Here's an example template:
class SplitConformal():
def __init__(self):
pass
def fit(self, Y_calib, Y_calib_pred, conformity_score_fn):
pass
def predict(self, Y_test, Y_test_pred, inv_conformity_score_fn):
pass
The above setup generalizes to both classification and regression through appropriate specification of conformity_score_fn and inv_conformity_score_fn. Furthermore, it is also easily to extendible to whatever score function a user comes up with (quantile based, uses both mean and std error estimate etc.).
Describe alternatives you've considered
We can use cv="prefit" as part of MapieRegressor to do Split Conformal calibration. However, the problem remains that the predictions have to be generated within predict (no way to calibrate existing predictions).
Hi @RudrakshTuwani , thanks for raising this issue ! I guess a simple solution to this problem would be to add an optional argument y in the predict method. If y is not None then y should be used as y_pred throughout predict. What do you think ?
By the way, sorry for not replying yet to your PR but we are currently really busy and we hope to find some to address your contributions within the next few weeks.
Hey @vtaquet , thanks for getting back and no worries! :)
I guess that could work for predict but I can't think of a clean solution for fit. Also, the primary motivations behind this is to have a model API agnostic conformal method that can be easily extended to incorporate cutting edge conformal methods.
Hi @RudrakshTuwani, This implementation needs quite a huge refactoring to our code. As a simple solution, you could use this wrapper :
class FakeModel:
def __init__(self):
self.pred_proba = None
self.trained_ = True
self.classes_ = np.array([0, 1])
def fit(self, X, y):
pass
def predict_proba(self, X):
return X
def get_params(self, deep=True):
return {}
def predict(self, X):
pred_proba = self.predict_proba(X)
return np.argmax(pred_proba, axis=1)
def __sklearn_is_fitted__(self):
return True
where the input X is in fact your predictions that you computed earlier. Doing so, there is no need to call the "real" predictmethod of your model.
We will look for a cleaner solution in the long term. I close the issue for now (Closed as not planned), we will re-open it when integrating this change to our roadmap