Design - Improve MAPIE and mlFlow interaction
Hi! This is a bit of a general question / suggestion. I have trouble working with MAPIE and mlflow for experiment / model tracking. That is a bit of a pity, because it limits the usability of an otherwise nice library.
Is your feature request related to a problem? Please describe.
The model.predict() output of Tuple[Array, Tuple[Array, Array]] is not super self-explanatory and a bit cumbersome when it comes to further downstream processing, especially with mlflow experiment tracking / deployment.
Suggestion / possible solution (but very open for discussion)
A relatively straight-forwad solution would be to have the model output as Dict({"mean": Array, "lower": Array, "upper": Array}). That way it is clear what is what and this is ought to be accepted by the mlflow infer_signature(). (I've monkey patched my estimator to check this). To avoid breaking changes, one could add an output_format parameter in the estimator class.
Did somebody find other ways to work well with MAPIE and mlflow apart from monkey patching? Appreciate any input :)
Cheers, Simon
Hey @simon-hirsch,
thank you for this issue and it seems like your monkey patch fixes this issue for the moment! This is not something we had taken into account. We do have a very specific structure for the output of conformal predictions. Also note that for some models, you can provide multiple alphas in the model.predict(). Meaning that:
print(mapie_regressor.predict(X_test, alpha=0.2)[0].shape)
print(mapie_regressor.predict(X_test, alpha=0.2)[1].shape)
# output
(250,)
(250, 2, 1)
and
print(mapie_regressor.predict(X_test, alpha=[0.2, 0.3])[0].shape)
print(mapie_regressor.predict(X_test, alpha=[0.2, 0.3])[1].shape)
# output
(250,)
(250, 2, 2)
This is a comment we will take into account for future changes, so thank you!
Hello,
This issue will be addressed with the release of MAPIE v1.
The output shape of model.predict(), currently structured as Tuple[Array, Tuple[Array, Array]], will be divided into two distinct methods:
model.predict()for point predictions, with output shape(n_samples,)model.predict_set()for interval predictions, with output shape(n_samples, 2)
Cool, looking forward. Do you also plan to support multiple sets at once, i.e. something along the lines of: estimator.predict_sets(X, widths=[0.5, 0.75, 0.9]) with output shape (n, 6)?
Hello @simon-hirsch. As you may have noticed, we released v1 few weeks ago. We had some design changes compared to what @jawadhussein462 stated previously, and I don't think your issue is resolved by the release.
However, I definitely agree that improving signature readability, and thus interaction with MLFlow, would be a good improvement.
As a first step, I'd like to figure out what the minimum viable change could be for native MLFlow compatibility. I can see that MLFlow supports different model signatures, but I'm not familiar with it to be honest. It seems that arrays are supported. Have you tried defining an tuple of array as a signature?