ArchR .testMarkerSC() failing when mean1 and/or mean2 is NA during call to getMarkerFeatures()

Discussed in https://github.com/GreenleafLab/ArchR/discussions/1319

^{Originally posted by wuv21 March 3, 2022} Hi!

Been running into an issue similar to #562 and #483 with the same error output, but I have been running ArchR v1.0.2 (commit 48dccf) which was the solution to both those questions. As such, I believed that the error was related to something else.

By debugging through the getMarkerFeatures() call (see below for my function call):

tmp <- getMarkerFeatures(
  ArchRProj = proj,
  useMatrix = "MotifMatrix",
  groupBy = "condensedCluster",
  testMethod = "wilcoxon",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  useGroups = "CD4_TRUE",
  bgdGroups = "CD4_FALSE",
  maxCells = 1000,
  verbose = TRUE,
  useSeqnames = "z"
)

I noticed that one of the rows in pairwiseDF (see source below) contained NA values for both mean1 and mean2 (small section of data frame shown below). The row "f156" is the only row with this issue. All other rows were ok in the rest of pairwiseDF. https://github.com/GreenleafLab/ArchR/blob/48dccf361bab6d9191a4d9b61e1317acfa71167c/R/MarkerFeatures.R#L422-L426

     log2Mean     log2FC          fdr         pval     mean1      mean2  n       auc
f155 3.089913        NaN 7.740502e-02 5.694163e-03 0.6838235 -0.2663543 36 0.6898148
f156 0.000000 0.00000000 2.650317e-10 3.046341e-13        NA         NA 36 0.0000000
f157 3.705964 0.08330494 9.909369e-01 9.147955e-01 0.3451817  0.3242560 36 0.4922840

Since mean1 and mean2 were NA, idxfilter contained a vector of both boolean values and a NA that was associated with row "f156". I think the fix below would solve this issue:

  idxFilter <- rowSums(pairwiseDF[, c("mean1", "mean2")]) != 0 & rowSums(is.na(pairwiseDF[, c("mean1", "mean2")])) == 0

Thanks!

Edit: updated my fix; didn't realize Github changed a piece of code into a link. Edit2: updated fix again.

Mar 04 '22 04:03 rcorces

@wuv21 - I converted this to an Issue post.

I think the most important question is why do mean1 and mean2 get set to NA in the first place? Any chance you can parse that out? I'd rather try to figure out how to prevent the NAs from getting generated than work around their existence.

Mar 04 '22 04:03 rcorces

Thanks @rcorces for moving it into issues and for your reply. I dug deeper and found that this particular motif had no matches in my dataset. I did the following:

# load in the matches rds from getPeakAnnotation()
tmp <- readRDS(getPeakAnnotation(proj, "Motif")$Matches)

# below line returns FALSE; i tried other columns corresponding to other motifs and saw that there were TRUE values
any(tmp@assays@data@listData[["matches"]][, 156])

# double checked to make sure that there were no positions either in motif 156
# other motifs had elements in the GRanges objects
tmp <- readRDS(getPeakAnnotation(proj, "Motif")$Positions)

tmp[[156]]

GRanges object with 0 ranges and 1 metadata column:
   seqnames    ranges strand |     score
      <Rle> <IRanges>  <Rle> | <numeric>
  -------
  seqinfo: 23 sequences from an unspecified genome; no seqlengths

As such, .getPartialMatrix() returns back a matrix that has NA values associated with that particular motif when it is called in .testMarkersSC().

https://github.com/GreenleafLab/ArchR/blob/968e4421ce7187a8ac7ea1cf6077412126876d5f/R/MarkerFeatures.R#L346-L353

I think this might just be a rare case of this motif not being visible in this current project/biological context (possibly due to the lower cell counts for this current analysis). This error doesn't occur for me with other projects but I do notice that this motif has the smallest number of peaks associated with it compared to all the other motifs.

Mar 04 '22 14:03 wuv21

I have temporarily patched this on a new branch (dev_idxFilter) via https://github.com/GreenleafLab/ArchR/commit/f091e426bbe6f78b149ded4f9b49e86ad4f5640f but I think the better fix is to remove these offending rows from the motif matrix entirely. Will try to address this more thoroughly soon.

Mar 25 '22 18:03 rcorces