destiny icon indicating copy to clipboard operation
destiny copied to clipboard

Segmentation fault with `DiffusionMap`

Open lucygarner opened this issue 1 year ago • 4 comments

Hi,

I am getting a segmentation fault with DiffusionMap. I am not changing any of the defaults.

  1. When inputting a data matrix as data, I get the following error:

*** caught segfault *** address 0x7f7675a30cb0, cause 'memory not mapped' Error: Could not call find_knn. Consider specifying knn_params = list(M = <larger number>). Original error: long vectors not supported yet: ../../src/include/Rinlinedfuns.h:537

  1. When inputting a SingleCellExperiment object as data, I get the following error:

*** caught segfault *** address 0x7f9dc6fb2cb0, cause 'memory not mapped'

Traceback: 1: knn_asym(data, k, distance) 2: knn.covertree::find_knn(data, k, query = query, distance = distance, sym = sym) 3: (function (data, k, ..., query = NULL, distance = c("euclidean", "cosine", "rankcor", "l2"), method = c("covertree", "hnsw"), sym = TRUE, verbose = FALSE) { p <- utils::modifyList(formals(RcppHNSW::hnsw_knn), list(...)) method <- match.arg(method) distance <- match.arg(distance) if (!is.double(data)) { warning("find_knn does not yet support sparse matrices, converting data to a dense matrix.") data <- as.matrix(data) } if (method == "covertree") { return(knn.covertree::find_knn(data, k, query = query, distance = distance, sym = sym)) } if (distance == "rankcor") { distance <- "cosine" data <- rank_mat(data) if (!is.null(query)) query <- rank_mat(query) } if (is.null(query)) { knn <- hnsw_knn(data, k + 1L, distance, M = p$M, ef_construction = p$ef_construction, ef = p$ef, verbose = verbose) knn$idx <- knn$idx[, -1, drop = FALSE] knn$dist <- knn$dist[, -1, drop = FALSE] } else { index <- hnsw_build(data, distance, M = p$M, ef = p$ef_construction, verbose = verbose) knn <- hnsw_search(query, index, k, ef = p$ef, verbose = verbose) } names(knn)[[1L]] <- "index" knn$dist_mat <- sparseMatrix(rep(seq_len(nrow(knn$index)), k), as.vector(knn$index), x = as.vector(knn$dist), dims = c(nrow(if (is.null(query)) data else query), nrow(data))) if (is.null(query)) { if (sym) knn$dist_mat <- symmetricise(knn$dist_mat) nms <- rownames(data) } else { nms <- rownames(query) } rownames(knn$dist_mat) <- rownames(knn$index) <- rownames(knn$dist) <- nms colnames(knn$dist_mat) <- rownames(data) knn})(new("dgCMatrix", i = c(11854L, 32418L, 46422L, 42L, 100L, 173L, 285L, 293L, 419L, 504L, 629L, 694L, 743L, 777L, 835L, 1122L, 1183L, 1214L, 1259L, 1318L, 1382L, 1389L, 1402L, 1407L, 1655L, 1738L, 1779L, 1997L, 2008L, 2018L, 2023L, 2060L, 2204L, 2241L, 2416L, 2500L, 2558L, 2635L, 2690L, 2701L, 2715L, 2738L, 2742L, 2908L, 2982L, 3118L, 3119L, 3153L, 3311L, 3420L, 3566L, 3605L, 3691L, 3695L, 3715L, 3759L, 4015L, 4108L, 4164L, 4209L, 4260L, 4307L, 4319L, 4373L, 4649L, 4672L, 4702L, 4860L, 5361L, 5426L, 5593L, 5595L, 5638L, 5643L, 5675L, 5791L, 5934L, 5937L, 5942L, 6441L, 6442L, 6604L, 6714L, 6731L, 6740L, 6800L, 6844L, 6881L, 6906L, 6954L, 6984L, 7027L, 7033L, 7099L, 7177L, 7196L, 7260L, 7343L, 7356L, 7376L, 7569L, 7688L, 7831L, 7952L, 8024L, 8071L, 8097L, 8128L, 8131L, 8179L, 8207L, 8216L, 8444L, 8503L, 8527L, 8698L, 8718L, 8776L, 8820L, 8856L, 8987L, 8994L, 9116L, 9362L, 9363L, 9383L, 9449L, 9631L, 9686L, 9714L, 9750L, 9826L, 9873L, 10063L, 10079L, 10392L, 10400L, 10469L, 10504L, 10579L, 10600L, 10646L, 10866L, 10961L, 11055L, 11501L, 11511L, 11671L, 11780L, 11823L, 12115L, 12134L, 12242L, 12290L, 12353L, 12411L, 12544L, 12571L, 12890L, 12982L, 13013L, 13019L, 13029L, 13193L, 13259L, 13497L, 13548L, 13646L, 13704L, 13820L, 13896L, 13922L, 14016L, 14026L, 14045L, 14135L, 14158L, 14213L, 14221L, 14280L, 14368L, 14376L, 14390L, 14527L, 14598L, 14776L, 14850L, 14910L, 14942L, 15176L, 15356L, 15496L, 15505L, 15507L, 15566L, 15792L, 15824L, 15842L, 15951L, 16007L, 16331L, 16340L, 16345L, 16352L, 16406L, 16416L, 16471L, 16595L, 16656L, 16785L, 16869L, 16880L, 17217L, 17392L, 17461L, 17579L, 17582L, 17897L, 17948L, 18031L, 18195L, 18331L, 18378L, 18456L, 18459L, 18560L, 18590L, 18657L, 18820L, 18851L, 19034L, 19073L, 19181L, 19403L, 19689L, 19800L, 19851L, 19866L, 19918L, 19967L, 20026L, 20101L, 20104L, 20180L, 20225L, 20262L, 20549L, 20666L, 20737L, 20900L, 21116L, 21412L, 21725L, 21749L

I assume these errors are both down to the large size of my data (~100,000 cells x ~20000 genes) and the best approach would be to input PCA scores rather than the normalised expression values? Or is there another way around this?

Best wishes, Lucy

lucygarner avatar Mar 10 '23 15:03 lucygarner

Hi! I think you might be right:

long vectors not supported yet

might mean that your data is stored as a long vector, and something can’t deal with this.

Which R version are you using? If the Rinlinedfuns.h from your version is identical to the current trunk version, the error happens in this line: https://github.com/wch/r-source/blob/dac7eca95d50285a12addcf74ca42d82fc2bfe9b/src/include/Rinlinedfuns.h#L537 which looks weird: We can’t get the length of it?

What structure does your data matrix have? I assume it’s a sparse matrix, but it still has that many entries?

flying-sheep avatar Mar 10 '23 16:03 flying-sheep

Hi,

No, it's a dense matrix. Are sparse matrices accepted? I was looking at dataset_extract_doublematrix (https://github.com/theislab/destiny/blob/master/R/dataset-helpers.r) and it appears to require either a matrix, data.frame, ExpressionSet, or SingleCellExperiment object.

Unless I use as.matrix() to convert my dgCMatrix into a dense matrix, is.matrix() gives FALSE.

I am using R 4.2.0 and destiny 3.12.0.

Best wishes, Lucy

lucygarner avatar Mar 10 '23 17:03 lucygarner

Oh! Focusing on scanpy must have led to me neglecting to finish convenient sparse matrix support here. I’m sorry!

What you can do is to use the distance matrix support. If you specify a “sparse distance matrix”* as distance parameter and NULL or a covariate dataframe as data, destiny will skip doing the KNN search itself.

covariates <- data.frame(...)  # cell metadata
dists <- N2R::Knn(data)  # I think N2R supports sparse data, but I don’t know
dm <- DiffusionMap(covariates, distance = dists)

*It’s a bit of an awkward format, as the non-specified entries in such a sparse matrix don’t stand for 0, but for “unknown large distance”.

flying-sheep avatar Mar 11 '23 13:03 flying-sheep

Thank you. I tried to run N2R::Knn on my "dgCMatrix" (normalised expression), but got an error.

Error in n2Knn(m = m, k = k, nThreads = nThreads, verbose = verbose, indexType = indexType, : Not compatible with requested type: [type=S4; target=double].

I have got DiffusionMap working with PCA embeddings as data, so I will try this for now.

lucygarner avatar Mar 18 '23 17:03 lucygarner