cuml
cuml copied to clipboard
[BUG] Pickle Approximate NearestNeighbors models
Describe the bug
Approximate nearest neighbor models ('ivfflat
', 'ivfpq'
) store their state is a knnIndex
object. Currently there is no support to pickle models that were fitted using these algorithms. The error only shows while predicting with the loaded model.
Steps/Code to reproduce bug
import cudf
from cuml.neighbors import NearestNeighbors
from cuml.datasets import make_blobs
X, _ = make_blobs(n_samples=50, centers=5, n_features=10, random_state=42)
X_cudf = cudf.DataFrame(X)
# fit model
model = NearestNeighbors(n_neighbors=3, algorithm='ivfflat')
model.fit(X)
# pickle the model
import pickle
pickle.dump(model, open("ann_model.pkl", "wb"))
Now start a new process (e.g. new Jupyter kernel)
import cudf
from cuml.neighbors import NearestNeighbors
from cuml.datasets import make_blobs
X, _ = make_blobs(n_samples=50, centers=5, n_features=10, random_state=42)
X_cudf = cudf.DataFrame(X)
import pickle
model_loaded = pickle.load(open("knn_model.pkl", "rb"))
distances2, indices2 = model_loaded.kneighbors(X_cudf)
This will result in the process dying. This is probably due to accessing the model state through knnIndex
pointer, which was just saved/restored as int values, but does not point to a valid object if the process is restarted. (One can see this by observing the 'knn_index'
value in the dict returned by model.__getstate__()
).
Expected behavior
Pickling and loading the model shall work. To achieve this ANN models need to serialize / deserialize their knnIndex
object while pickling the model.
Environment details (please complete the following information):
- Tested using 22.04 conda packages.
Thank you for spotting that. Unfortunately, it looks like there is no simple solution for this right now. Indeed the knnIndex struct contains GPU resources handled by FAISS. However, if we develop our own ANN algorithms it might become easier to serialize the necessary data though.
Yes, I agree. We can return to this question after https://github.com/rapidsai/raft/pull/652
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.