pynndescent icon indicating copy to clipboard operation
pynndescent copied to clipboard

Serialization

Open wmayner opened this issue 5 years ago • 5 comments

Some objects in pynndescent are not serializable by pickle or even cloudpickle, for example pynndescent.rp_trees.FlatTree. This prevents serialization of UMAP fit objects when pynndescent is used.

Relevant UMAP issue: lmcinnes/umap#273

wmayner avatar Jan 24 '20 19:01 wmayner

This seems to be related to how Numba handles things. Hopefully there is a reasonable work-around. It will take some time to figure out the right way to handle this.

lmcinnes avatar Jan 25 '20 14:01 lmcinnes

Potentially I have a fix here. Hopefully this works... I won't have time to test it for a while as I'm travelling very soon.

lmcinnes avatar Jan 25 '20 17:01 lmcinnes

Thanks a lot, I'll give it a try!

wmayner avatar Jan 25 '20 18:01 wmayner

Thanks! This patch seems to work but only for the sparse case. To handle the dense case I changed renumbaify_tree to the following, which seems to fix the issue:

def renumbaify_tree(tree):
    if tree.hyperplanes[0].ndim == 1:
        hyperplanes = numba.typed.List.empty_list(dense_hyperplane_type)
    else:
        hyperplanes = numba.typed.List.empty_list(sparse_hyperplane_type)
    ....

With this change I'm able to serialize a UMAP instance using joblib, and load it back.

adilosa avatar Apr 17 '20 00:04 adilosa

@adilosa's snippet also got me unstuck.

jpambrun avatar Jun 03 '20 19:06 jpambrun