umap icon indicating copy to clipboard operation
umap copied to clipboard

ValueError in transform method after fitting on > 4095 samples

Open AlexandreAbreu8 opened this issue 5 years ago • 9 comments

After fitting UMAP with a dataset with more than 4095 samples, if I then use the transform() method on a different set of data an error occurs.

How to reproduce the error:

import numpy as np
import umap

reducer = umap.UMAP()
reducer.fit(np.random.rand(4096, 4000))
reducer.transform(np.random.rand(500, 4000))

Meanwhile, if we fit the model with 4095 samples or less everything works just fine

reducer = umap.UMAP()
reducer.fit(np.random.rand(4095, 4000))
reducer.transform(np.random.rand(500, 4000))

Note: I have pynndescent==0.4.7 installed and umap-learn==0.4.5.

Traceback


ValueError Traceback (most recent call last) in ----> 1 transformed_data = reducer.transform(np.random.rand(500, 4000))

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1216 # query_data = check_array(query_data, dtype=np.float64, order='C') 1217 query_data = np.asarray(query_data).astype(np.float32, order="C") -> 1218 self._init_search_graph() 1219 result = search( 1220 query_data,

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in _init_search_graph(self) 1065 1066 # Get rid of any -1 index entries -> 1067 self._search_graph = self._search_graph.tocsr() 1068 self._search_graph.data[self._search_graph.indices == -1] = 0.0 1069 self._search_graph.eliminate_zeros()

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/scipy/sparse/lil.py in tocsr(self, copy) 460 indptr = np.empty(M + 1, dtype=idx_dtype) 461 indptr[0] = 0 --> 462 _csparsetools.lil_get_lengths(self.rows, indptr[1:]) 463 np.cumsum(indptr, out=indptr) 464 nnz = indptr[-1]

_csparsetools.pyx in scipy.sparse._csparsetools.lil_get_lengths()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Relevant or not but if I run the transform() method again I get a different error:

Traceback


AttributeError Traceback (most recent call last) in ----> 1 transformed_data = reducer.transform(np.random.rand(500, 4000))

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1221 k, 1222 self._raw_data, -> 1223 self._search_forest, 1224 self._search_graph.indptr, 1225 self._search_graph.indices,

AttributeError: 'NNDescent' object has no attribute '_search_forest'

AlexandreAbreu8 avatar Jul 01 '20 15:07 AlexandreAbreu8

There was a change in the internals of scipy sparse lil_matrix handling in the latest scipy release that broke some things. I have been working to catch all the issues for the last week or so. There is now a umap-learn 0.4.6 and pynndescent 0.4.8 that will hopefully resolve these issues. If they don't, let me know, as there are probably still a few uncaught cases here somewhere.

lmcinnes avatar Jul 01 '20 15:07 lmcinnes

@lmcinnes which version of scipy is safe to use with older UMAP?

Also, how can we get UMAP 0.4.6? Installing from pypi or master only gets 0.4.5

lambdaofgod avatar Jul 02 '20 09:07 lambdaofgod

@lambdaofgod Scipy version 1.4.1 seems to be working just fine

AlexandreAbreu8 avatar Jul 02 '20 11:07 AlexandreAbreu8

@lambdaofgod Sorry -- it is coming.

lmcinnes avatar Jul 02 '20 15:07 lmcinnes

The problem still persists with latest version of scipy and pynndescent installed.

100rab-S avatar Sep 25 '21 08:09 100rab-S

This is exact error and strack trace shouldn't be happening anymore. Can you give more details on the exact error you are currently seeing?

lmcinnes avatar Sep 27 '21 16:09 lmcinnes

The error was gone when I updated all the libraries in my environment. I don't know which library was giving the issue. But before updating all the libraries, I tried by updating only the scipy, pynndescent, but the error was still there.

100rab-S avatar Sep 28 '21 09:09 100rab-S