pyod
pyod copied to clipboard
Pickle error when using deep learning based models with SUOD
Environment: WSL2 with Conda (Python 3.8.10) Error:
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 356, in _sendback_result
result_queue.put(_ResultItem(work_id, result=result,
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/loky/backend/queues.py", line 241, in put
obj = dumps(obj, reducers=self._reducers)
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps
dump(obj, buf, reducers=reducers, protocol=protocol)
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump
_LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.RLock' object
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "suod_test.py", line 51, in <module>
clf.fit(X_train)
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/pyod/models/suod.py", line 210, in fit
self.model_.fit(X)
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/suod/models/base.py", line 290, in fit
all_results = Parallel(n_jobs=n_jobs, max_nbytes=None, verbose=True)(
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/parallel.py", line 1054, in __call__
self.retrieve()
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/parallel.py", line 933, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/home/pbosch/miniconda3/envs/data_science/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
TypeError: cannot pickle '_thread.RLock' object
How to reproduce: Add AutoEncoder or DeepSVDD to list of detectors in base SUOD example (DeepSVDD(hidden_neurons=[2, 1])).
Just for completeness sake, using AutoEncoder or DeepSVDD alone works perfectly fine.
Thanks for the note. I forget to mention in the documentation that the deep learning model will not benefit from SUOD acceleration... The bottleneck of deep learning training is mainly the accessibility to GPUs...Sorry for the confusion.
In that case, what is the best approach to benefit from the acceleration for non deep learning models but being able to combine them with deep learning models? SUOD has that mechanism build in, which is handy. It would also be possible to train each model separately and then use combination. But is there a way to have both?
This is a great point! I think the combination comes from two perspectives. First, you may consider using deep learning models as feature extractors, and then apply the classifical OD models on the extracted latent representations.
Second, you could construct a matrix to hold the outlier scores for combination, and then use the https://github.com/yzhao062/pyod/blob/master/examples/comb_example.py for combination.
Actually, I created another package called combo a few years ago for model combination: https://github.com/yzhao062/combo/blob/master/examples/detector_comb_example.py although I am not sure whether deep learning models are compatible there.
Sorry for the late answer, I didn't get any notification for some reason.
The second option you outlined is what I had in mind. But I think the first option might work better for my current problem. The data is fairly noisy and the prediction probabilities are all over the place.
Considering, for example, an AutoEncoder, what would be the easiest way to extract the latent representations? Would I need to go through the Keras object or is there another way?