Pickle issue with load_ParametricUMAP
Describe the bug
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: File "/var/app/current/application.py", line 478, in load_stuff
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: model = load_ParametricUMAP(model_set + '/' + full_name,
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/umap/parametric>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: model = pickle.load((open(model_output, "rb")))
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/numba/core/seri>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: ctor, states = loads(serialized)
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: ^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: TypeError: code() argument 13 must be str, not int
To Reproduce Steps to reproduce the behavior: ubuntu 20.04 Python 3.11 umap-learn==0.5.3
- create an embedding:
distance = 'sokalsneath'
op_mix_ratio = 0.3
embed_dim = 10
reducer = umap.ParametricUMAP(random_state = 42,
transform_seed = 42,
n_neighbors = 15,
n_epochs = 500,
metric = distance,
min_dist = 0.0,
set_op_mix_ratio = op_mix_ratio,
n_components = embed_dim)
mapper = reducer.fit(model_vectors)
mapper.save(data_path + '/' + date_prefix + '/' +
date_prefix + '_umap_mapper.umap')
- attempt to load the model on a different linux machine using load_ParametricUMAP(
)
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: File "/var/app/current/application.py", line 478, in load_stuff
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: model = load_ParametricUMAP(model_set + '/' + full_name,
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/umap/parametric>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: model = pickle.load((open(model_output, "rb")))
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: File "/home/user/mambaforge/envs/tensorml/lib/python3.11/site-packages/numba/core/seri>
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: ctor, states = loads(serialized)
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: ^^^^^^^^^^^^^^^^^
Jun 19 17:36:52 409683c0-aaf6-48ad-9b2b-d7874460547c gunicorn[89838]: TypeError: code() argument 13 must be str, not int
Expected behavior On another machine this worked. I believe it is a subtle pickle issue. I had issues with other pickle files, which was solved by using pickle.dump(object, open(filename), protocol = 2). I have not figured out how to get umap to use the protocol.
Desktop (please complete the following information):
- OS: Windows 11 Pro, running WSL 2 with Ubuntu 20.04
Update--this may be a Python3.11-related issue. I have tested downgrading the server to Python3.9 and things seem too work then. I did try loading Python3.11 on my dev system and re-saving the model, but still got the error on the Python3.11 server.
hey, can you try this branch to see if it resolves the issue on python 3.11? https://github.com/lmcinnes/umap/pull/1123
I can confirm this is related to the python version. How should I proceed?
@timsainb I can see the #1123 has conflicts to be resolved. Is this in a shape that I could use for building a custom version to see if it solves the issue or do you want to rebase first?
We are just about to pull in an updated version of Parametric UMAP https://github.com/lmcinnes/umap/pull/1153 so my plan is to wait till that is pulled in to integrate #1123