anndataR
anndataR copied to clipboard
Error `NAs introduced by coercion to integer range` when reading large H5AD
Hi, Thank you for developing this tool to convert the data between anndata and seurat/SingleCellExperiment. I install the latest package. When I read the h5ad file into the R, I found the adata.X is NULL. I don't know if it's because the matrix is too large (1797611 × 35125), or there are other reasons. Thank you. Long
adata <- read_h5ad(file.path(dir_output, "adata_obj_raw.h5ad")) Warning messages: 1: In Matrix::sparseMatrix(j = indices, p = indptr, x = data, dims = shape, : NAs introduced by coercion to integer range 2: Error reading element X of type <csr_matrix> ℹ 'p' must be a nondecreasing vector c(0, ...) 3: In Matrix::sparseMatrix(j = indices, p = indptr, x = data, dims = shape, : NAs introduced by coercion to integer range 4: Error reading element layers/counts of type <csr_matrix> ℹ 'p' must be a nondecreasing vector c(0, ...) adata AnnData object with n_obs × n_vars = 1797611 × 35125
Hi, The following is the updated info: I did a test and only extract 3000 cells. So the adata file is only 3000 x 35125. Then I re-load the file using the above script. And it works and no any error reported, although I did not check very carefully. Best, Long
Hi @LongpanUPC
Thanks for reporting this. As you have probably guessed, I think you are probably running into limit for an integer in R, you can check this with:
1797611L * 35125L
# [1] NA
# Warning message:
# In 1797611L * 35125L : NAs produced by integer overflow
We should think about how we can handle this or at least give a better error message. Thanks for confirming it works with a subset of the file.
I can confirm that I also run into this issue with a very large single cell dataset.
We might be able to solve by switching to SparseArray when 32-bit indexing is not sufficient:
library(SparseArray)
large_dims <- c(1800000L, 35000L)
large_svt <- SVT_SparseArray(dim = large_dims, type = "double")
large_svt[1000000, 30000] <- 42.0
large_svt[1000000, 30000]
dim(large_svt)
# [1] 1800000 35000
length(large_svt)
# [1] 6.3e+10