pyod
pyod copied to clipboard
Cannot save AutoEncoder
The official instructions say to use joblib for pickling PyOD models.
This fails for AutoEncoders, or any other TensorFlow-backed model as far as I can tell. The error is:
>>> dump(model, 'model.joblib')
...
TypeError: can't pickle _thread.RLock objects
Note that it's not sufficient to save the underlying Keras Sequential model, since I need the methods & variables of BaseDetector (like .decision_scores_
or .decision_function()
.
Sorry for being late on this. I recall some people mentioned that pickle may work. Haven't investigate. Should possibly do some experiment
I am having the same issue, I found a solution for the AutoEncoder thanks to this answer: https://github.com/yzhao062/pyod/issues/88#issuecomment-615343139 (pickle or dill do not work for me) but I have the same problem with SOGAAL and MOGAAL and I don't know how to solve it.
Yes, I think it's important for PyOD models to have a unified save/load API. Right now, it randomly breaks based on the underlying library each model uses.
I temporarily got around this by creating a wrapper class with different save/load logic for sklearn vs TF models.
@kennysong can u share ur wrapper?
@ezzeldinadel
Unfortunately, the wrapper won't be that useful for you since it's for ensembles and is not a complete implementation.
I'll share some tips that might be a starting point for you, though.
- scikit-learn models can be pickled, but TensorFlow models require using
.save()
- So if you want to save/load an ensemble of models, you'll need to save/load it in separate files
- I didn't bother to figure out how to save VAEs, but it should be possible in another way
Here's a snippet as reference.
class EnsembleDetector:
...
def save(self, folder):
'''Saves the EnsembleDetector (as multiple files) in a given folder.'''
# Save TF-based AutoEncoders in separate sub-directories (they don't pickle)
tf_models = {} # {index for self.models: model}
for i, model in enumerate(self.models):
if 'AutoEncoder' in str(type(model)):
model.model_.save(Path(folder)/str(i))
tf_models[i] = model.model_
model.model_ = None # Remove non-pickleable TF models from self so we can pickle self
if 'VAE' in str(type(model)):
raise Exception('VAE is not supported when saving the ensemble yet, since it uses a Lambda layer.')
# Pickle the entire EnsembleDetector after the TF models are removed
Path(folder).mkdir(parents=True, exist_ok=True)
joblib.dump(self, Path(folder)/'ensemble_detector.joblib')
# Add the TF model objects back into self
for i in tf_models: self.models[i].model_ = tf_models[i]
@staticmethod
def load(folder):
'''Loads the EnsembleDetector (from multiple files) in a given folder.'''
# Unpickle the EnsembleDetector object
ed = joblib.load(Path(folder)/'ensemble_detector.joblib')
# Load TF-based AutoEncoders from separate sub-directories (they don't pickle)
for i, model in enumerate(ed.models):
if 'AutoEncoder' in str(type(model)):
model.model_ = keras.models.load_model(Path(folder)/str(i))
return ed
Having custom saving for keras model, would be useful