pyod icon indicating copy to clipboard operation
pyod copied to clipboard

Cannot save AutoEncoder

Open kennysong opened this issue 4 years ago • 6 comments

The official instructions say to use joblib for pickling PyOD models.

This fails for AutoEncoders, or any other TensorFlow-backed model as far as I can tell. The error is:

>>> dump(model, 'model.joblib')
...
TypeError: can't pickle _thread.RLock objects

Note that it's not sufficient to save the underlying Keras Sequential model, since I need the methods & variables of BaseDetector (like .decision_scores_ or .decision_function().

kennysong avatar Dec 06 '20 07:12 kennysong

Sorry for being late on this. I recall some people mentioned that pickle may work. Haven't investigate. Should possibly do some experiment

yzhao062 avatar Jan 15 '21 18:01 yzhao062

I am having the same issue, I found a solution for the AutoEncoder thanks to this answer: https://github.com/yzhao062/pyod/issues/88#issuecomment-615343139 (pickle or dill do not work for me) but I have the same problem with SOGAAL and MOGAAL and I don't know how to solve it.

TimotheeGr avatar Jan 15 '21 21:01 TimotheeGr

Yes, I think it's important for PyOD models to have a unified save/load API. Right now, it randomly breaks based on the underlying library each model uses.

I temporarily got around this by creating a wrapper class with different save/load logic for sklearn vs TF models.

kennysong avatar Jan 16 '21 02:01 kennysong

@kennysong can u share ur wrapper?

ezzeldinadel avatar Feb 09 '21 20:02 ezzeldinadel

@ezzeldinadel

Unfortunately, the wrapper won't be that useful for you since it's for ensembles and is not a complete implementation.

I'll share some tips that might be a starting point for you, though.

  • scikit-learn models can be pickled, but TensorFlow models require using .save()
  • So if you want to save/load an ensemble of models, you'll need to save/load it in separate files
  • I didn't bother to figure out how to save VAEs, but it should be possible in another way

Here's a snippet as reference.

class EnsembleDetector:

    ...

    def save(self, folder):
        '''Saves the EnsembleDetector (as multiple files) in a given folder.'''
        # Save TF-based AutoEncoders in separate sub-directories (they don't pickle)
        tf_models = {}   # {index for self.models: model} 
        for i, model in enumerate(self.models):
            if 'AutoEncoder' in str(type(model)):
                model.model_.save(Path(folder)/str(i))
                tf_models[i] = model.model_
                model.model_ = None  # Remove non-pickleable TF models from self so we can pickle self
            if 'VAE' in str(type(model)):
                raise Exception('VAE is not supported when saving the ensemble yet, since it uses a Lambda layer.')

        # Pickle the entire EnsembleDetector after the TF models are removed
        Path(folder).mkdir(parents=True, exist_ok=True)
        joblib.dump(self, Path(folder)/'ensemble_detector.joblib')

        # Add the TF model objects back into self
        for i in tf_models: self.models[i].model_ = tf_models[i]

    @staticmethod
    def load(folder):
        '''Loads the EnsembleDetector (from multiple files) in a given folder.'''
        # Unpickle the EnsembleDetector object
        ed = joblib.load(Path(folder)/'ensemble_detector.joblib')

        # Load TF-based AutoEncoders from separate sub-directories (they don't pickle)
        for i, model in enumerate(ed.models):
            if 'AutoEncoder' in str(type(model)):
                model.model_ = keras.models.load_model(Path(folder)/str(i))

        return ed

kennysong avatar Feb 09 '21 23:02 kennysong

Having custom saving for keras model, would be useful

arita37 avatar Mar 03 '21 01:03 arita37