pynndescent icon indicating copy to clipboard operation
pynndescent copied to clipboard

Slice error using mac M1-max ARM

Open thegodone opened this issue 2 years ago • 6 comments

I try the code on a large dataset 200k x 2.5k, using last version v0.5.10 with ever dense or sparse dataset, I have an error:

My code:

´´´ index = pynndescent.NNDescent(crs_test, metric='cosine') ´´´ it run for 10,20 secs than got this error:


ValueError Traceback (most recent call last) File :1

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/pynndescent_.py:804, in NNDescent.init(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose) 793 print(ts(), "Building RP forest with", str(n_trees), "trees") 794 self._rp_forest = make_forest( 795 data, 796 n_neighbors, (...) 802 self._angular_trees, 803 ) --> 804 leaf_array = rptree_leaf_array(self._rp_forest) 805 else: 806 self._rp_forest = None

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/rp_trees.py:1097, in rptree_leaf_array(rp_forest) 1095 def rptree_leaf_array(rp_forest): 1096 if len(rp_forest) > 0: -> 1097 return np.vstack(rptree_leaf_array_parallel(rp_forest)) 1098 else: 1099 return np.array([[-1]])

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/pynndescent/rp_trees.py:1089, in rptree_leaf_array_parallel(rp_forest) 1088 def rptree_leaf_array_parallel(rp_forest): -> 1089 result = joblib.Parallel(n_jobs=-1, require="sharedmem")( 1090 joblib.delayed(get_leaves_from_tree)(rp_tree) for rp_tree in rp_forest 1091 ) 1092 return result

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:1098, in Parallel.call(self, iterable) 1095 self._iterating = False 1097 with self._backend.retrieval_context(): -> 1098 self.retrieve() 1099 # Make sure that we get a last message telling us we are done 1100 elapsed_time = time.time() - self._start_time

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self) 973 try: 974 if getattr(self._backend, 'supports_timeout', False): --> 975 self._output.extend(job.get(timeout=self.timeout)) 976 else: 977 self._output.extend(job.get())

File ~/miniforge3/envs/tf/lib/python3.9/multiprocessing/pool.py:771, in ApplyResult.get(self, timeout) 769 return self._value 770 else: --> 771 raise self._value

File ~/miniforge3/envs/tf/lib/python3.9/multiprocessing/pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception) 123 job, i, func, args, kwds = task 124 try: --> 125 result = (True, func(*args, **kwds)) 126 except Exception as e: 127 if wrap_exception and func is not _helper_reraises_exception:

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/_parallel_backends.py:620, in SafeFunction.call(self, *args, **kwargs) 618 def call(self, *args, **kwargs): 619 try: --> 620 return self.func(*args, **kwargs) 621 except KeyboardInterrupt as e: 622 # We capture the KeyboardInterrupt and reraise it as 623 # something different, as multiprocessing does not 624 # interrupt processing for a KeyboardInterrupt 625 raise WorkerInterrupt() from e

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:288, in BatchedCalls.call(self) 284 def call(self): 285 # Set the default nested backend to self._backend but do not set the 286 # change the default number of processes to -1 287 with parallel_backend(self._backend, n_jobs=self._n_jobs): --> 288 return [func(*args, **kwargs) 289 for func, args, kwargs in self.items]

File ~/miniforge3/envs/tf/lib/python3.9/site-packages/joblib/parallel.py:288, in (.0) 284 def call(self): 285 # Set the default nested backend to self._backend but do not set the 286 # change the default number of processes to -1 287 with parallel_backend(self._backend, n_jobs=self._n_jobs): --> 288 return [func(*args, **kwargs) 289 for func, args, kwargs in self.items]

ValueError: cannot assign slice from input of different size

thegodone avatar May 10 '23 09:05 thegodone

I found the problem, I did not pass the distance

thegodone avatar May 10 '23 09:05 thegodone

I just got this same error on an x86 machine (n2d-highmem-8 GCP VM) and I'm unclear on what you needed to do to fix this. In any case I think this is a bug, as additional arguments shouldn't be necessary.

edit: Of course, as soon as I comment it starts mysteriously working...was failing consistently before. I wonder if I had some bad version cached or something

jamestwebber avatar Jun 15 '23 14:06 jamestwebber

I agree this is odd, and I'll try to keep a lookout for a reproducer.

lmcinnes avatar Jun 15 '23 22:06 lmcinnes

I think I have a reproducer but not sure how to share it. It seems completely data specific: I got this error with np.sqrt(X) but not X (and I don't think it's a dtype issue).

jamestwebber avatar Jun 15 '23 22:06 jamestwebber

I have a sporadic reproducer with a fairly small array (1.8M on disk, saved as numpy npz). It seems like this problem was introduced in a recent update. My suspicion is that this comes from something at the edges of giving the rows to n_jobs and an uneven split.

pynndescent_bug_np.npz.zip

edit: The above array seems to fail consistently only when passed through sqrt but right now I don't want to figure out why that is 🙃

jamestwebber avatar Jun 16 '23 14:06 jamestwebber

So it was quirky. There was some code added to bail when the tree splitting was not working well and avoid excess depth. Unfortunately that meant that, in rare cases, the size of a leaf could exceed the leaf_size set. This made things not match up when building leaf arrays at the end, because we expected things to match the leaf size. Now we have a max_leaf_size, and expand things in those rare cases. In theory this could blow up terribly for bad data by consuming ungodly amounts of memory, but that's a very rare case indeed, and I'm not sure there is any way to fix it anyway. The best answer in that case is simply to increase the leaf size in the NNDescent params.

lmcinnes avatar Aug 01 '23 19:08 lmcinnes