neofuzz icon indicating copy to clipboard operation
neofuzz copied to clipboard

Unable to load a saved model

Open lucas-vizru opened this issue 10 months ago • 19 comments

I am trying to load a saved model so that I can avoid wasting time building the index each time I run my code.

An example that produces the error is attached.

I found that following your save examples works fine it's just when trying to combine the first search example with a saved model that things seem to go awry.

`numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Cannot bind 'args=(Array(int32, 1, 'C', True, aligned=True), Array(float32, 1, 'C', True, aligned=True), Array(int32, 1, 'C', False, aligned=True), Array(float32, 1, 'C', False, aligned=True)) kws={}' to signature '(x, y)' due to "TypeError: too many positional arguments". During: resolving callee type: type(CPUDispatcher(<function alternative_cosine at 0x10a169ee0>)) During: typing of call at /Users/lucas/K8_Projects/GameLauncher/.venv/lib/python3.13/site-packages/pynndescent/pynndescent_.py (1560)

File ".venv/lib/python3.13/site-packages/pynndescent/pynndescent_.py", line 1560: def search_closure( d = np.float32( dist( ^

During: Pass nopython_type_inference`

failExample.py.txt

lucas-vizru avatar Feb 07 '25 23:02 lucas-vizru

This is quite curious, since we have a test case for this, and tests are passing. Can you provide info on which version you're using, along with PyNNDescent version? My guess is that it must be something fishy with the Numba, Python and PyNNDescent versioning that's causing you headeaches.

x-tabdeveloping avatar Feb 08 '25 16:02 x-tabdeveloping

I am using

python 3.13 neofuzz-0.3.4 pynndescent-0.5.13

lucas-vizru avatar Feb 09 '25 03:02 lucas-vizru

What version of joblib are you using?

Also, is using an older version of Python an option?

x-tabdeveloping avatar Feb 10 '25 11:02 x-tabdeveloping

I am using Joblib 1.4.2

I can use any version of python I want for this If you can share your environment details I can set mine to match

lucas-vizru avatar Feb 10 '25 17:02 lucas-vizru

having the same issue

NorthIsUp avatar Feb 24 '25 06:02 NorthIsUp

@NorthIsUp can I get your python, numba, pynndescent, numpy and joblib versions? I'm trying to look into this, but so far I'm failing to reproduce the error.

x-tabdeveloping avatar Feb 24 '25 08:02 x-tabdeveloping

It was the latest of everything and python 3.13

-- adam 🚀 On Feb 24, 2025 at 00:14 -0800, Márton Kardos @.***>, wrote:

@NorthIsUp can I get your python, numba, pynndescent, numpy and joblib versions? I'm trying to look into this, but so far I'm failing to reproduce the error.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

x-tabdeveloping left a comment (x-tabdeveloping/neofuzz#16) @NorthIsUp can I get your python, numba, pynndescent, numpy and joblib versions? I'm trying to look into this, but so far I'm failing to reproduce the error. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

NorthIsUp avatar Feb 24 '25 08:02 NorthIsUp

This is very curious, I've installed the latest versions of all packages along with Python 3.13 and The tests are still passing. Can you share a minimum reproducible example of this with me?

x-tabdeveloping avatar Feb 24 '25 08:02 x-tabdeveloping

When I'm back at work I'll get the complete dependency list for ya

NorthIsUp avatar Feb 24 '25 16:02 NorthIsUp

I think the reason your tests pass (Test passes for me too)

My example is using a different process than your test covers

Your test covers process = Process(make_pipeline(SubWordVectorizer(), NMF(20))) And I am using process = char_ngram_process(tf_idf=False)

lucas-vizru avatar Feb 27 '25 20:02 lucas-vizru

Riight, okay I can take a look at that. Though my guess would be that this shouldn't introduce any problems since the Char N-gram process is only using scikit-learn components. Thanks for the tip @lucas-vizru

x-tabdeveloping avatar Feb 28 '25 10:02 x-tabdeveloping

@x-tabdeveloping : Did you find any solution for this. I am also struggling with this. I also tried pickle and marshal, but no success :-(

In my case, I get the exception error message: too many positional arguments. The same I get when I use pickle to save the process directly :-(

I use the following libs: neofuzz 0.3.4 joblib 1.4.2 pynndescent 0.5.13

Is the process object to complex for the pickle.dump/load?

schmid-bosch avatar Apr 10 '25 16:04 schmid-bosch

All (@x-tabdeveloping),

After more debugging I discovered that saving and loading the model works as long as you use neofuzz == 0.3.3.

NOTE: The actual problem is that neofuzz-0.3.4 uses pynndescent-0.5.13, which breaks the ability to save the model. With pynndescent-0.5.13 it works.

schmid-bosch avatar Apr 17 '25 17:04 schmid-bosch

Confirmed, older version of neofuzz works.

I also had to use numpy==1.26.4

That is due to this error: AttributeError: np.infty was removed in the NumPy 2.0 release. Use np.inf instead.

That said 0.3.4 likes numpy 2.0 so no worries on that

lucas-vizru avatar Apr 18 '25 19:04 lucas-vizru

Hey all! Sorry for the silence.. It unfortunately seems to me that there is not much I can do to fix this issue. Multiple people have already raised this with PyNNDescent's developers, but there hasn't been any progress on making indices properly saveable, and there hasn't been a single release for quite a few months now.

I am considering moving the library to a different backend like FAISS, Chroma, Annlite or Annoy, but it will take time, especially since I'm a little occupied with a bunch of other projects.

I know this is an uncomfortable situation for all of you but the best I can recommend is downgrading the library.

If any of you by any chance have spare time to submit a PR on a backend switch, I would be more than happy to review your code and help implement it.

x-tabdeveloping avatar Apr 19 '25 16:04 x-tabdeveloping

Actually, it was easier than I expected, working on migrating the library to Annoy on this PR: https://github.com/x-tabdeveloping/neofuzz/pull/17

x-tabdeveloping avatar Apr 19 '25 17:04 x-tabdeveloping

Version 0.4.0 now uses Annoy instead of PyNNDescent, so this issue should be resolved, feel free to try it out!

x-tabdeveloping avatar Apr 19 '25 18:04 x-tabdeveloping

Thanks for the support!

Library issues SUCK so I really appreciate it man!

lucas-vizru avatar Apr 21 '25 23:04 lucas-vizru

Hi, thanks for your great work!

I have updated to the latest github version and it doesn't work for me, unfortunately. I get this error:

Cell In[13], line 4
      1 corpus = gene_ont["symbol"].dropna().drop_duplicates().to_list()[1:10]
      3 process = char_ngram_process(metric = 'angular')
----> 4 process.index(corpus)
      6 process.extract("your query", limit=30, refine_levenshtein=True)

File ~/Documents/lamin_test/.venv/lib/python3.12/site-packages/neofuzz/process.py:79, in Process.index(self, options)
     77 self.nearest_neighbours = AnnoyIndex(n_dimensions, self.metric)
     78 for i_option, vector in enumerate(vectors):
---> 79     self.nearest_neighbours.add_item(i_option, vector)
     80 self.nearest_neighbours.build(self.n_trees, n_jobs=self.n_jobs)

File ~/Documents/lamin_test/.venv/lib/python3.12/site-packages/scipy/sparse/_base.py:378, in _spbase.__len__(self)
    377 def __len__(self):
--> 378     raise TypeError("sparse array length is ambiguous; use getnnz()"
    379                     " or shape[0]")

TypeError: sparse array length is ambiguous; use getnnz() or shape[0]

AleksZakirov avatar Apr 22 '25 16:04 AleksZakirov