ArchR
ArchR copied to clipboard
.testMarkerSC() failing when mean1 and/or mean2 is NA during call to getMarkerFeatures()
Discussed in https://github.com/GreenleafLab/ArchR/discussions/1319
Originally posted by wuv21 March 3, 2022 Hi!
Been running into an issue similar to #562 and #483 with the same error output, but I have been running ArchR v1.0.2 (commit 48dccf) which was the solution to both those questions. As such, I believed that the error was related to something else.
By debugging through the getMarkerFeatures() call (see below for my function call):
tmp <- getMarkerFeatures(
ArchRProj = proj,
useMatrix = "MotifMatrix",
groupBy = "condensedCluster",
testMethod = "wilcoxon",
bias = c("TSSEnrichment", "log10(nFrags)"),
useGroups = "CD4_TRUE",
bgdGroups = "CD4_FALSE",
maxCells = 1000,
verbose = TRUE,
useSeqnames = "z"
)
I noticed that one of the rows in pairwiseDF (see source below) contained NA values for both mean1 and mean2 (small section of data frame shown below). The row "f156" is the only row with this issue. All other rows were ok in the rest of pairwiseDF.
https://github.com/GreenleafLab/ArchR/blob/48dccf361bab6d9191a4d9b61e1317acfa71167c/R/MarkerFeatures.R#L422-L426
log2Mean log2FC fdr pval mean1 mean2 n auc
f155 3.089913 NaN 7.740502e-02 5.694163e-03 0.6838235 -0.2663543 36 0.6898148
f156 0.000000 0.00000000 2.650317e-10 3.046341e-13 NA NA 36 0.0000000
f157 3.705964 0.08330494 9.909369e-01 9.147955e-01 0.3451817 0.3242560 36 0.4922840
Since mean1 and mean2 were NA, idxfilter contained a vector of both boolean values and a NA that was associated with row "f156". I think the fix below would solve this issue:
idxFilter <- rowSums(pairwiseDF[, c("mean1", "mean2")]) != 0 & rowSums(is.na(pairwiseDF[, c("mean1", "mean2")])) == 0
Thanks!
Edit: updated my fix; didn't realize Github changed a piece of code into a link. Edit2: updated fix again.
@wuv21 - I converted this to an Issue post.
I think the most important question is why do mean1 and mean2 get set to NA in the first place? Any chance you can parse that out? I'd rather try to figure out how to prevent the NAs from getting generated than work around their existence.
Thanks @rcorces for moving it into issues and for your reply. I dug deeper and found that this particular motif had no matches in my dataset. I did the following:
# load in the matches rds from getPeakAnnotation()
tmp <- readRDS(getPeakAnnotation(proj, "Motif")$Matches)
# below line returns FALSE; i tried other columns corresponding to other motifs and saw that there were TRUE values
any(tmp@assays@data@listData[["matches"]][, 156])
# double checked to make sure that there were no positions either in motif 156
# other motifs had elements in the GRanges objects
tmp <- readRDS(getPeakAnnotation(proj, "Motif")$Positions)
tmp[[156]]
GRanges object with 0 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
-------
seqinfo: 23 sequences from an unspecified genome; no seqlengths
As such, .getPartialMatrix() returns back a matrix that has NA values associated with that particular motif when it is called in .testMarkersSC().
https://github.com/GreenleafLab/ArchR/blob/968e4421ce7187a8ac7ea1cf6077412126876d5f/R/MarkerFeatures.R#L346-L353
I think this might just be a rare case of this motif not being visible in this current project/biological context (possibly due to the lower cell counts for this current analysis). This error doesn't occur for me with other projects but I do notice that this motif has the smallest number of peaks associated with it compared to all the other motifs.
I have temporarily patched this on a new branch (dev_idxFilter) via https://github.com/GreenleafLab/ArchR/commit/f091e426bbe6f78b149ded4f9b49e86ad4f5640f but I think the better fix is to remove these offending rows from the motif matrix entirely. Will try to address this more thoroughly soon.