[Bug]: Instance variable is_trained is used incorrectly
Module
Prediction (API)
Contact Details
Current Behavior
DualPredictor is supposed to be initialized with is_trained parameter that should be a list of booleans. There are a few issues related to this variable:
- Using lists as default values is not recommended because it can lead to unwanted side effects. However, side effects may never occur if the parameter is not updated within constructor, see more here. You can use tuple instead.
- The instance variable
is_trainedis not updated after methodfitis completed:
for count, model in enumerate(self.models):
if not self.is_trained[count]:
model.fit(X, y, **dictargs[count])
#end of method
Instead, you may want to rewrite it like this:
for count, model in enumerate(self.models):
if not self.is_trained[count]:
model.fit(X, y, **dictargs[count])
self.is_trained[count] = True
#end of method
- Exception can be not thrown when it should in some cases when
is_trainedis a part of condition ofifstatement:
...
if self.train:
if self.splitter is None:
raise RuntimeError(
"The splitter argument is None but train is set to "
+ "True. Please provide a correct splitter to train "
+ "the underlying model."
)
logger.info(f"Fitting model on fold {i+cached_len}")
predictor.fit(X_fit, y_fit, **kwargs) # Fit K-fold predictor
# Make sure that predictor is already trained if train arg is False
elif self.train is False and predictor.is_trained is False:
raise RuntimeError(
"'train' argument is set to 'False' but model is not pre-trained"
)
else: # Skipping training
logger.info("Skipping training.")
...
In elif statement predictor.is_trained is used as boolean but in fact it can be a list if predictor is an instance of DualPredictor. In this case it will be True if the list is not empty even though a model can still be not trained.
The solution suggested in point 2 is only partial. I think the best way would be to keep is_trained variable private (rename it to _is_trained) and introduce a property is_trained which will behave as instance variable but will be implemented as a method under the hood. For example for DualPredictor it can look like this:
@property
def is_trained(self) -> bool:
return self._is_trained[0] and self._is_trained[1]
Expected Behavior
is_trained is expected to be boolean
is_trained is expected to change after models are trained
Version
v0.9
Environment
No response
Relevant log output
No response
To Reproduce
I do not have example of a code that actually fails because of that. The test named test_locally_adaptive_cp actually runs the part of the code with aforementioned elif statement.
Thanks @fink-stanislav for posting this issue and suggesting fixes.
- I agree that switching to tuples would be a better option.
- We already considered updating the
is_trainedstatus, but we must ensure it has no side effects elsewhere. Basically, this status serves as a double check during the conformalization process: the attributetrainassociated with an instance ofConformalPredictor(defined in moduledeel.puncc.api.conformalization) needs to be consistent with the training status of the underlying predictor. I will look further into that. - When
predictor.is_trainedis an iterable, the testpredictor.is_trained is Falseis indeed inconsistent. Using an attribute getter, as you suggest, looks like a great solution 👍 !
We will keep you updated on the progress regarding these issues. Thanks again for your findings.
Hi @fink-stanislav
We have fixed the issue that you reported. However, regarding your first point, switching to tuples would cause a break in the current API. Therefore, I will reopen this issue and consider it an enhancement for the future version 0.8.x.
Thank you for your valuable contribution, and please let us know if you have any further concerns or suggestions.