anndataR icon indicating copy to clipboard operation
anndataR copied to clipboard

Error `NAs introduced by coercion to integer range` when reading large H5AD

Open LongpanICR opened this issue 4 months ago • 4 comments

Hi, Thank you for developing this tool to convert the data between anndata and seurat/SingleCellExperiment. I install the latest package. When I read the h5ad file into the R, I found the adata.X is NULL. I don't know if it's because the matrix is too large (1797611 × 35125), or there are other reasons. Thank you. Long

adata <- read_h5ad(file.path(dir_output, "adata_obj_raw.h5ad")) Warning messages: 1: In Matrix::sparseMatrix(j = indices, p = indptr, x = data, dims = shape, : NAs introduced by coercion to integer range 2: Error reading element X of type <csr_matrix> ℹ 'p' must be a nondecreasing vector c(0, ...) 3: In Matrix::sparseMatrix(j = indices, p = indptr, x = data, dims = shape, : NAs introduced by coercion to integer range 4: Error reading element layers/counts of type <csr_matrix> ℹ 'p' must be a nondecreasing vector c(0, ...) adata AnnData object with n_obs × n_vars = 1797611 × 35125

LongpanICR avatar Jul 23 '25 16:07 LongpanICR

Hi, The following is the updated info: I did a test and only extract 3000 cells. So the adata file is only 3000 x 35125. Then I re-load the file using the above script. And it works and no any error reported, although I did not check very carefully. Best, Long

LongpanICR avatar Jul 23 '25 17:07 LongpanICR

Hi @LongpanUPC

Thanks for reporting this. As you have probably guessed, I think you are probably running into limit for an integer in R, you can check this with:

1797611L * 35125L
# [1] NA
# Warning message:
# In 1797611L * 35125L : NAs produced by integer overflow

We should think about how we can handle this or at least give a better error message. Thanks for confirming it works with a subset of the file.

lazappi avatar Jul 24 '25 05:07 lazappi

I can confirm that I also run into this issue with a very large single cell dataset.

bioinfomagician avatar Aug 13 '25 20:08 bioinfomagician

We might be able to solve by switching to SparseArray when 32-bit indexing is not sufficient:

library(SparseArray)
large_dims <- c(1800000L, 35000L)

large_svt <- SVT_SparseArray(dim = large_dims, type = "double")

large_svt[1000000, 30000] <- 42.0

large_svt[1000000, 30000]

dim(large_svt)
# [1] 1800000   35000
length(large_svt)
# [1] 6.3e+10

rcannood avatar Aug 25 '25 12:08 rcannood