umap
umap copied to clipboard
ValueError in transform method after fitting on > 4095 samples
After fitting UMAP with a dataset with more than 4095 samples, if I then use the transform() method on a different set of data an error occurs.
How to reproduce the error:
import numpy as np
import umap
reducer = umap.UMAP()
reducer.fit(np.random.rand(4096, 4000))
reducer.transform(np.random.rand(500, 4000))
Meanwhile, if we fit the model with 4095 samples or less everything works just fine
reducer = umap.UMAP()
reducer.fit(np.random.rand(4095, 4000))
reducer.transform(np.random.rand(500, 4000))
Note: I have pynndescent==0.4.7 installed and umap-learn==0.4.5.
Traceback
ValueError Traceback (most recent call last)
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1216 # query_data = check_array(query_data, dtype=np.float64, order='C') 1217 query_data = np.asarray(query_data).astype(np.float32, order="C") -> 1218 self._init_search_graph() 1219 result = search( 1220 query_data,
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in _init_search_graph(self) 1065 1066 # Get rid of any -1 index entries -> 1067 self._search_graph = self._search_graph.tocsr() 1068 self._search_graph.data[self._search_graph.indices == -1] = 0.0 1069 self._search_graph.eliminate_zeros()
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/scipy/sparse/lil.py in tocsr(self, copy) 460 indptr = np.empty(M + 1, dtype=idx_dtype) 461 indptr[0] = 0 --> 462 _csparsetools.lil_get_lengths(self.rows, indptr[1:]) 463 np.cumsum(indptr, out=indptr) 464 nnz = indptr[-1]
_csparsetools.pyx in scipy.sparse._csparsetools.lil_get_lengths()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
Relevant or not but if I run the transform() method again I get a different error:
Traceback
AttributeError Traceback (most recent call last)
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1221 k, 1222 self._raw_data, -> 1223 self._search_forest, 1224 self._search_graph.indptr, 1225 self._search_graph.indices,
AttributeError: 'NNDescent' object has no attribute '_search_forest'
There was a change in the internals of scipy sparse lil_matrix handling in the latest scipy release that broke some things. I have been working to catch all the issues for the last week or so. There is now a umap-learn 0.4.6 and pynndescent 0.4.8 that will hopefully resolve these issues. If they don't, let me know, as there are probably still a few uncaught cases here somewhere.
@lmcinnes which version of scipy is safe to use with older UMAP?
Also, how can we get UMAP 0.4.6? Installing from pypi or master only gets 0.4.5
@lambdaofgod Scipy version 1.4.1 seems to be working just fine
@lambdaofgod Sorry -- it is coming.
The problem still persists with latest version of scipy and pynndescent installed.
This is exact error and strack trace shouldn't be happening anymore. Can you give more details on the exact error you are currently seeing?
The error was gone when I updated all the libraries in my environment. I don't know which library was giving the issue. But before updating all the libraries, I tried by updating only the scipy, pynndescent, but the error was still there.