Unable to load a saved model
I am trying to load a saved model so that I can avoid wasting time building the index each time I run my code.
An example that produces the error is attached.
I found that following your save examples works fine it's just when trying to combine the first search example with a saved model that things seem to go awry.
`numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Cannot bind 'args=(Array(int32, 1, 'C', True, aligned=True), Array(float32, 1, 'C', True, aligned=True), Array(int32, 1, 'C', False, aligned=True), Array(float32, 1, 'C', False, aligned=True)) kws={}' to signature '(x, y)' due to "TypeError: too many positional arguments". During: resolving callee type: type(CPUDispatcher(<function alternative_cosine at 0x10a169ee0>)) During: typing of call at /Users/lucas/K8_Projects/GameLauncher/.venv/lib/python3.13/site-packages/pynndescent/pynndescent_.py (1560)
File ".venv/lib/python3.13/site-packages/pynndescent/pynndescent_.py", line 1560:
def search_closure(
During: Pass nopython_type_inference`
This is quite curious, since we have a test case for this, and tests are passing. Can you provide info on which version you're using, along with PyNNDescent version? My guess is that it must be something fishy with the Numba, Python and PyNNDescent versioning that's causing you headeaches.
I am using
python 3.13 neofuzz-0.3.4 pynndescent-0.5.13
What version of joblib are you using?
Also, is using an older version of Python an option?
I am using Joblib 1.4.2
I can use any version of python I want for this If you can share your environment details I can set mine to match
having the same issue
@NorthIsUp can I get your python, numba, pynndescent, numpy and joblib versions? I'm trying to look into this, but so far I'm failing to reproduce the error.
It was the latest of everything and python 3.13
-- adam 🚀 On Feb 24, 2025 at 00:14 -0800, Márton Kardos @.***>, wrote:
@NorthIsUp can I get your python, numba, pynndescent, numpy and joblib versions? I'm trying to look into this, but so far I'm failing to reproduce the error.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
x-tabdeveloping left a comment (x-tabdeveloping/neofuzz#16) @NorthIsUp can I get your python, numba, pynndescent, numpy and joblib versions? I'm trying to look into this, but so far I'm failing to reproduce the error. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
This is very curious, I've installed the latest versions of all packages along with Python 3.13 and The tests are still passing. Can you share a minimum reproducible example of this with me?
When I'm back at work I'll get the complete dependency list for ya
I think the reason your tests pass (Test passes for me too)
My example is using a different process than your test covers
Your test covers process = Process(make_pipeline(SubWordVectorizer(), NMF(20))) And I am using process = char_ngram_process(tf_idf=False)
Riight, okay I can take a look at that. Though my guess would be that this shouldn't introduce any problems since the Char N-gram process is only using scikit-learn components. Thanks for the tip @lucas-vizru
@x-tabdeveloping : Did you find any solution for this. I am also struggling with this. I also tried pickle and marshal, but no success :-(
In my case, I get the exception error message: too many positional arguments.
The same I get when I use pickle to save the process directly :-(
I use the following libs: neofuzz 0.3.4 joblib 1.4.2 pynndescent 0.5.13
Is the process object to complex for the pickle.dump/load?
All (@x-tabdeveloping),
After more debugging I discovered that saving and loading the model works as long as you use neofuzz == 0.3.3.
NOTE: The actual problem is that neofuzz-0.3.4 uses pynndescent-0.5.13, which breaks the ability to save the model. With pynndescent-0.5.13 it works.
Confirmed, older version of neofuzz works.
I also had to use numpy==1.26.4
That is due to this error:
AttributeError: np.infty was removed in the NumPy 2.0 release. Use np.inf instead.
That said 0.3.4 likes numpy 2.0 so no worries on that
Hey all! Sorry for the silence.. It unfortunately seems to me that there is not much I can do to fix this issue. Multiple people have already raised this with PyNNDescent's developers, but there hasn't been any progress on making indices properly saveable, and there hasn't been a single release for quite a few months now.
I am considering moving the library to a different backend like FAISS, Chroma, Annlite or Annoy, but it will take time, especially since I'm a little occupied with a bunch of other projects.
I know this is an uncomfortable situation for all of you but the best I can recommend is downgrading the library.
If any of you by any chance have spare time to submit a PR on a backend switch, I would be more than happy to review your code and help implement it.
Actually, it was easier than I expected, working on migrating the library to Annoy on this PR: https://github.com/x-tabdeveloping/neofuzz/pull/17
Version 0.4.0 now uses Annoy instead of PyNNDescent, so this issue should be resolved, feel free to try it out!
Thanks for the support!
Library issues SUCK so I really appreciate it man!
Hi, thanks for your great work!
I have updated to the latest github version and it doesn't work for me, unfortunately. I get this error:
Cell In[13], line 4
1 corpus = gene_ont["symbol"].dropna().drop_duplicates().to_list()[1:10]
3 process = char_ngram_process(metric = 'angular')
----> 4 process.index(corpus)
6 process.extract("your query", limit=30, refine_levenshtein=True)
File ~/Documents/lamin_test/.venv/lib/python3.12/site-packages/neofuzz/process.py:79, in Process.index(self, options)
77 self.nearest_neighbours = AnnoyIndex(n_dimensions, self.metric)
78 for i_option, vector in enumerate(vectors):
---> 79 self.nearest_neighbours.add_item(i_option, vector)
80 self.nearest_neighbours.build(self.n_trees, n_jobs=self.n_jobs)
File ~/Documents/lamin_test/.venv/lib/python3.12/site-packages/scipy/sparse/_base.py:378, in _spbase.__len__(self)
377 def __len__(self):
--> 378 raise TypeError("sparse array length is ambiguous; use getnnz()"
379 " or shape[0]")
TypeError: sparse array length is ambiguous; use getnnz() or shape[0]