umap icon indicating copy to clipboard operation
umap copied to clipboard

Serializing UMAP

Open martinobertoni opened this issue 5 years ago • 7 comments

Hi! thanks for implementing UMAP it's very handy! When serializing a trained UMAP object via pickle I had the following error:

pickle.dump(myumap, open('pickle.pkl', 'w'))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

A workaround I've found is:

pickle.dump(myumap, open('pickle.pkl', 'w'), protocol=-1)

Do you think it is safe? Is there a better/recommended serialization approach?

martinobertoni avatar Aug 09 '19 13:08 martinobertoni

Sadly I think that is the recommended approach at this point. You could also look at using joblib for serialization instead of pickle.

lmcinnes avatar Aug 10 '19 19:08 lmcinnes

I'm getting a different error when I try to pickle a fitted UMAP object, even when I use protocol=-1:

TypeError: can't pickle _nrt_python._MemInfo objects

I also tried using wrap_non_picklable_objects from joblib, and that didn't work either.

The reason I'd like to do this is because

  • the wonderful new plotting interface in 0.4 requires a UMAP object, rather than the transformed data
  • the parameters that I used are stored on that object, which is convenient for reproduction

Is there a workaround that you know of? For now I'll just save the transformed data and parameters, and plot things myself, but I think it would be nice to be able to simply serialize the object.

wmayner avatar Jan 03 '20 19:01 wmayner

It seems that the _nrt_python._MemInfo error is caused by the attempt to serialize pynndescent._rp_trees.FlatTree objects.

wmayner avatar Jan 24 '20 19:01 wmayner

I think the simplest thing is to delete the _rp_trees attribute. You'll only need it if you want to transform new data, and for the use cases you describe that should be fine. I'll try to figure out a more long term fix when I get some time.

lmcinnes avatar Jan 25 '20 14:01 lmcinnes

For Googlers: the relevant attribute seems to be _rp_forest. Deleting this allowed the UMAP objects to be pickled.

wmayner avatar Jan 30 '20 18:01 wmayner

bump. I need to use loaded umap (pickle, joblib, etc) for inference ("You'll only need [._rp_trees] it if you want to transform new data"). I think that might be a main reason people would want to serialize incidentally; future inference. Thanks for the project!

lefnire avatar Aug 01 '20 07:08 lefnire