umap
umap copied to clipboard
Serializing UMAP
Hi! thanks for implementing UMAP it's very handy! When serializing a trained UMAP object via pickle I had the following error:
pickle.dump(myumap, open('pickle.pkl', 'w'))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
A workaround I've found is:
pickle.dump(myumap, open('pickle.pkl', 'w'), protocol=-1)
Do you think it is safe? Is there a better/recommended serialization approach?
Sadly I think that is the recommended approach at this point. You could also look at using joblib for serialization instead of pickle.
I'm getting a different error when I try to pickle a fitted UMAP object, even when I use protocol=-1
:
TypeError: can't pickle _nrt_python._MemInfo objects
I also tried using wrap_non_picklable_objects
from joblib
, and that didn't work either.
The reason I'd like to do this is because
- the wonderful new plotting interface in 0.4 requires a UMAP object, rather than the transformed data
- the parameters that I used are stored on that object, which is convenient for reproduction
Is there a workaround that you know of? For now I'll just save the transformed data and parameters, and plot things myself, but I think it would be nice to be able to simply serialize the object.
It seems that the _nrt_python._MemInfo
error is caused by the attempt to serialize pynndescent._rp_trees.FlatTree
objects.
I think the simplest thing is to delete the _rp_trees
attribute. You'll only need it if you want to transform new data, and for the use cases you describe that should be fine. I'll try to figure out a more long term fix when I get some time.
For Googlers: the relevant attribute seems to be _rp_forest
. Deleting this allowed the UMAP objects to be pickled.
bump. I need to use loaded umap (pickle, joblib, etc) for inference ("You'll only need [._rp_trees] it if you want to transform new data"). I think that might be a main reason people would want to serialize incidentally; future inference. Thanks for the project!