rapids-single-cell-examples
rapids-single-cell-examples copied to clipboard
OverflowError: value too large to convert to int
Could I ask if you might have any tips on how to overcome this error?
I'm running your 1M cell code, but I tried it on my own set of 2.8M cells.
Here's my matrix:
sparse_gpu_array.shape
# (2886934, 33567)
sparse_gpu_array.nnz
# 4128695018
Let's try to run this:
sparse_gpu_array, genes = rapids_scanpy_funcs.filter_genes(sparse_gpu_array, genes, min_cells=1000)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<timed exec> in <module>
~/work/github.com/slowkow/rapids-single-cell-examples/notebooks/rapids_scanpy_funcs.py in filter_genes(sparse_gpu_array, genes_idx, min_cells)
269 Genes containing a number of cells below this value will be filtered
270 """
--> 271 thr = np.asarray(sparse_gpu_array.sum(axis=0) >= min_cells).ravel()
272 filtered_genes = cp.sparse.csr_matrix(sparse_gpu_array[:, thr])
273 genes_idx = genes_idx[np.where(thr)[0]]
~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/base.py in sum(self, axis, dtype, out)
388
389 if axis == 0:
--> 390 ret = self.T.dot(cupy.ones(m, dtype=self.dtype)).reshape(1, n)
391 else: # axis == 1
392 ret = self.dot(cupy.ones(n, dtype=self.dtype)).reshape(m, 1)
~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/base.py in dot(self, other)
307 def dot(self, other):
308 """Ordinary dot product"""
--> 309 return self * other
310
311 def getH(self):
~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csc.py in __mul__(self, other)
111 return self._with_data(self.data * other)
112 elif other.ndim == 1:
--> 113 self.sum_duplicates()
114 if cusparse.check_availability('csrmv'):
115 csrmv = cusparse.csrmv
~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/compressed.py in sum_duplicates(self)
333 self._has_canonical_format = True
334 return
--> 335 coo = self.tocoo()
336 coo.sum_duplicates()
337 self.__init__(coo.asformat(self.format))
~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csc.py in tocoo(self, copy)
214
215 """
--> 216 return self.T.tocoo(copy).T
217
218 def tocsc(self, copy=None):
~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csr.py in tocoo(self, copy)
268 indices = self.indices
269
--> 270 return cusparse.csr2coo(self, data, indices)
271
272 def tocsc(self, copy=False):
~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupy/cusparse.py in csr2coo(x, data, indices)
900 cusparse.xcsr2coo(
901 handle, x.indptr.data.ptr, nnz, m, row.data.ptr,
--> 902 cusparse.CUSPARSE_INDEX_BASE_ZERO)
903 # data and indices did not need to be copied already
904 return cupyx.scipy.sparse.coo_matrix(
cupy/cuda/cusparse.pyx in cupy.cuda.cusparse.xcsr2coo()
OverflowError: value too large to convert to int
Hi @slowkow,
It looks like this issue may have been addressed already in cupy/cupy#4223. We are running into similar problems as we work through upcoming changes to use Cupy 8.0 and put more of the filtering logic on the GPU device.
An option for us to get around the size limitation in the gene filtering step might be to allocate an empty 1-d output array of size n_cells
and then perform the sum over a few batches. Take the following as an example to populate the summed array with the sums across the genes for the first 100 cells:
summed_gpu_array = cp.empty(sparse_gpu_array.shape[0], dtype=cp.float32)
summed_gpu_array[0:100] = sparse_gpu_array[0:100].sum(axis=0)
Corey, thanks for the reply! If I eventually get back to this error, I might try to modify your function filter_genes()
to perform a sum over multiple batches and see if the code runs from that point onward.
Could I please ask if you have successfully run the RAPIDS analysis on a real dataset that is larger than the 1M cell dataset?