pyod
pyod copied to clipboard
FileNotFoundError for bps_prediction.joblib when opening pickled model
I trained a model on one computer and then pickled it using joblib.dump
. On another computer, I opened the model using joblib.load
and got a FileNotFoundError
because bps_prediction.joblib
is trying to be opened from the path to the joblib file on the original computer, which differs from the new computer.
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
/var/folders/kn/bmgjf0611zsc41h256xmsz8c_nfjy5/T/ipykernel_15641/741748264.py in <module>
----> 1 clf.decision_function(X_positive)
~/.pyenv/versions/3.7.10/envs/$VIRTUALENV_NAME/lib/python3.7/site-packages/pyod/models/suod.py in decision_function(self, X)
258
259 # initialize the output score
--> 260 predicted_scores = self.model_.decision_function(X)
261
262 # standardize the score and combine
~/.pyenv/versions/3.7.10/envs/$VIRTUALENV_NAME/lib/python3.7/site-packages/suod/models/base.py in decision_function(self, X)
452 if self.bps_flag:
453 # load the pre-trained cost predictor to forecast the train cost
--> 454 cost_predictor = joblib.load(self.cost_forecast_loc_pred_)
455
456 time_cost_pred = cost_forecast_meta(cost_predictor, X,
~/.pyenv/versions/3.7.10/envs/$VIRTUALENV_NAME/lib/python3.7/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
575 obj = _unpickle(fobj)
576 else:
--> 577 with open(filename, 'rb') as f:
578 with _read_fileobject(f, filename, mmap_mode) as fobj:
579 if isinstance(fobj, str):
FileNotFoundError: [Errno 2] No such file or directory: '/home/local/$FOO/$USERNAME/.pyenv/versions/$VIRTUALENV_NAME/lib/python3.7/site-packages/suod/models/saved_models/bps_prediction.joblib'
I've omitted some of the exact values from the training system. $FOO
is a directory that contains my home directory which is $USERNAME
. $VIRTUALENV_NAME
is the name of the virtual environment I created using pyenv virtualenv 3.7.10 $VIRTUALENV_NAME
.
It looks like when the model is trained, the path to the pre-trained cost predictor is saved in the model object itself, which prevents the model from being used on a computer where that path is different.
I tried manually setting clf.cost_forecast_loc_pred
to the correct path to bps_prediction.joblib
, but still got the same error. I don't have access to create a symlink to point to the original location. How can I get the object to load bps_prediction.joblib
from the correct path?
noted. we have not considered the use case of saving SUOD. This may be a bit involved since bps_prediction.joblib should be part of the suod package. Would you mind sharing a minimal example with a synthetic dataset for reproducing purposes?
Thank you for your quick response! Here's an example.
First, run this on one computer:
from pyod.models.suod import SUOD
from pyod.models.lof import LOF
from pyod.models.iforest import IForest
from pyod.models.copod import COPOD
from pyod.utils.utility import standardizer
from pyod.utils.data import generate_data
import joblib
contamination = 0.1
n_train = 200
n_test = 100
X_train, y_train, X_test, y_test = \
generate_data(n_train=n_train,
n_test=n_test,
contamination=contamination,
random_state=42)
X_train, X_test = standardizer(X_train, X_test)
detector_list = [
LOF(contamination=contamination, n_neighbors=10),
LOF(contamination=contamination, n_neighbors=20),
COPOD(contamination=contamination),
IForest(contamination=contamination, n_estimators=100, max_samples=0.1),
IForest(contamination=contamination, n_estimators=100, max_samples=0.1, max_features=0.5)
]
clf = SUOD(
base_estimators=detector_list,
contamination=contamination,
n_jobs=1,
combination='average',
verbose=1
)
clf_name = 'SUOD'
detector_list = [LOF(n_neighbors=15), LOF(n_neighbors=20),
LOF(n_neighbors=25), LOF(n_neighbors=35),
COPOD(), IForest(n_estimators=100),
IForest(n_estimators=200)]
clf = SUOD(base_estimators=detector_list, n_jobs=2, combination='average',
verbose=False)
clf.fit(X_train)
joblib.dump(clf, 'model.pkl.bz2')
Then copy model.pkl.bz2
to another computer where the path to the virtualenv containing pyod/suod differs:
import joblib
from pyod.utils.utility import standardizer
from pyod.utils.data import generate_data
contamination = 0.1
n_train = 200
n_test = 100
clf = joblib.load('model.pkl.bz2')
X_train, y_train, X_test, y_test = \
generate_data(n_train=n_train,
n_test=n_test,
contamination=contamination,
random_state=42)
X_train, X_test = standardizer(X_train, X_test)
clf.predict(X_test)
This problem will likely also occur if another user on the same computer that generated the model tries to load and predict with the model, assuming the second user lacks permission to access the virtualenv contained in the original user's home directory.
Hello !
I faced the same issue : you either specify the location of the bps_prediction.joblib (I read that you already did that and didn't work) or you can just save the model + bps_prediction.joblib in a specific folder where the other user have the permissions needed, and then you specify the new location of the cost_forecast_loc_pred. It worked for me.
Thanks @lecorveclucas ! Unfortunately, I'm operating in environments where I generally have limited permissions, so this isn't always an option for me. For instance, a process might build a model expecting a certain directory structure for the virtual environment, and then when the model is run on another system, I might not be able to recreate that structure.
Alright, but there is something I don’t get : you can dump the trained model somewhere and then reused it, but you can’t dump the bps_prediction.joblib in the same folder ? I am sorry I might have not understood your answer because I don’t understand why you can load the trained model with an other user but not the bps_prediction.joblib which would be in the same folder ?
@lecorveclucas : The problem is that the model internally stores where it expects to find bps_prediction.joblib: it doesn't look for it in the current directory. It tries to find it in the folder containing the installed SUOD library, which can vary by user and machine.
According to the linked code I should be able to just modify .cost_forecast_loc_pred_
. The type of the object I modified is pyod.models.suod.SUOD
, so I think I'm setting this correctly... I'm at a loss as to what is going on.
This is the code that sets a default self.cost_forecast_loc_pred_
. this_directory
ends up as the location to the suod folder, such as: $HOME/.pyenv/versions/$VIRTUALENV_NAME/lib/python3.7/site-packages/suod/models/saved_models/bps_prediction.joblib'
Indeed, I had the same issue with the 'this_directory' : I was wondering to change it (to something like this_directory = os.getcwd() ) because it is used ONLY for the path of cost_forecast_loc_fit and cost_forecast_loc_pred. But because I can save bps_prediction.joblib in a folder with the trained model (and then specify the new location of the cost_forecast_loc_pred) I didn't change the this_directory path. Did you try to change it ?
I am sorry it might be a stupid question but did you consider using Docker in order to have the same environment on both of you machines ? It could simplify a lot the use and reuse of your models :)
@lecorveclucas : No, I didn't alter the source to change this_directory
. But I did get the following to work:
- In the code on the system that trains the model, I copied
bps_prediction.joblib
to the working directory of the scripts and added the following to the call toSUOD()
:cost_forecast_loc_pred='./bps_prediction.joblib'
. - When I loaded the picked model on another system, the string representation of the
SUOD
object had:cost_forecast_loc_pred='./bps_prediction.joblib'
. Deleting this file causes a failure when it doesn't exist, meaning that the one in the current working directory is being used.
After doing the above, I'm able to run decision_function()
! In the end, it looks like I need to set cost_forecast_loc_pred
at model training time to an easily accessible path, as something prevents the model object from recognizing that this has been changed once the model has been trained.
Docker would address the environment problem, but it'd also present other tasks in terms of getting approval and managing the security of the images. What I've described is working for me, but I'd like to figure out why I can't change cost_forecast_loc_pred
after the fact... I need to dig into this more, but perhaps the original value or the pre-trained model is being cached somewhere else in the object.
@muraiki, I have just found your response to this problem. I have tried the first part when training, but not sure about how to proceed with what you indicated in the second point. I am following what it is specified when loading a model here: https://suod.readthedocs.io/en/latest/model_persistence.html. Thanks in advance.