puncc Heavy CQR object

Hi! First of all, thank you for your work! We are currently using your package, specifically, CQR class (using two xgboost inside a DualPredictor).

When I serialised the object using joblib, I realised that the object size was about 500MB.

Checking the code I discovered this line, where the data is saved as an attribute of the IdSplitter. https://github.com/deel-ai/puncc/blob/d09f77307616405b5585a5f0d9c94aa30f2f9d99/deel/puncc/api/splitting.py#L82

Is this an expected behavior? I fixed it by setting:

my_object.conformal_predictor.splitter._split = None

before saving the artifact.

Sep 26 '24 15:09 diegoglozano

Hi @diegoglozano,

Thank you for your feedback.

We currently provide the method save for ConformalPredictor serialization. However, I understand that it may not address your concern, as it still serializes the data. To resolve this, I’ll introduce a flag argument that allows to specify whether the splitter should be saved or not.

In the meantime, your solution works well and can be used with no negative impact.

Sep 30 '24 10:09 M-Mouhcine

Hi @diegoglozano,

In our latest release, we introduced the save_data argument to deel.puncc.api.conformalization.ConformalPredictor.save. To achieve the behavior you're looking for, simply set this parameter to False. You can find details in the documentation.

I'm closing the issue but let me know if you have any other questions!

Cheers!

Oct 14 '24 17:10 M-Mouhcine