FRASER icon indicating copy to clipboard operation
FRASER copied to clipboard

Error: checkIntergenic on unlocalized and unplaced sequences

Open jakobjn opened this issue 9 months ago • 2 comments

When using comprehensive gene annotations, unlocalized and unplaced sequences result in an error when attempting to annotate potential impact of splice junctions with annotatePotentialImpact.

Debugging

  1. The error arises since no genes in the txdb are located on unlocalized chr*_random and unplaced chrUn_* sequences.
  2. This causes min(distance(test_junction, refseq.genes), na.rm = TRUE) to evaluate to Inf. https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L817
  3. distance(test_junction, refseq.genes) in turn evaluates to a vector of exclusively NA's.
  4. min(c(NA, NA, ...), na.rm=TRUE) is equivalent to min(numeric(0)), which evaluates to Inf.
  5. As Inf is greater than 0, the conditional dist > 0 evaluates to TRUE, whereas the code block under the if-statement is erroneously executed. https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L818-L830

Fix

This error can be resolved by modifying the condition of the if-statement from dist > 0 to is.finite(dist) && dist > 0. https://github.com/c-mertes/FRASER/blob/e775832aef77c14aa3c414832735bd1b0b9a4e47/R/resultAnnotations.R#L818

Error

1. annotatePotentialImpact(result = res_junc_dt, txdb = txdb, fds = fds)
6. addPotentialImpactLabels(annoResult, fds, txdb)
8. sapply(noLaps, function(i) {
 .     overlap <- to(findOverlaps(junctions_gr[i], exons))
 .     if (length(overlap) == 0) {
 .         return(checkIntergenic(junctions_gr, i, refseq.genes))
 .     }
 .     for (j in overlap) {
 .         exon_start = start(exons[j])
 .         exon_end = end(exons[j])
 .         if (exon_start <= start(junctions_gr[i]) & exon_end >= 
 .             end(junctions_gr[i])) {
 .             if ((end(junctions_gr[i]) - start(junctions_gr[i]) + 
 .                 1)%%3 != 0) {
 .                 frs = "likely"
 .             }
 .             else {
 .                 frs = "unlikely"
 .             }
 .             return(c("exonTruncation", frs))
 .         }
 .     }
 .     return(c("complex", "inconclusive"))
 . })
9. lapply(X = X, FUN = FUN, ...)
10. FUN(X[[i]], ...)
11. checkIntergenic(junctions_gr, i, refseq.genes)
12. start(refseq.genes[nearest(junctions_gr[i], refseq.genes)])
13. refseq.genes[nearest(junctions_gr[i], refseq.genes)]
14. refseq.genes[nearest(junctions_gr[i], refseq.genes)]
15. subset_along_ROWS(x, i, drop = drop)
16. extractROWS(x, i)
17. extractROWS(x, i)
18. normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
19. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
  .     allow.NAs = allow.NAs)
20. NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, 
  .     allow.NAs = allow.NAs)
21. .subscript_error("subscript contains NAs")
22. stop(wmsg(...), call. = FALSE)
23. .handleSimpleError(function (cond) 
  . .Internal(C_tryCatchHelper(addr, 1L, cond)), "subscript contains NAs", 
  .     base::quote(NULL))
24. h(simpleError(msg, call))

jakobjn avatar Apr 30 '24 08:04 jakobjn

Hi @jakobjn , thanks for reporting this! I will look into this and come back to you. Can you add which FRASER / R / Bionconductor version you are using and provide an example of a txdb object that results in the error?

ischeller avatar May 02 '24 09:05 ischeller

@ischeller I'm using FRASER 1.99.3 (1b0cd9581ba1b1ae2bf329821c3a5823ef6505b7) as a Conda package built from this custom recipe. This is to exclusively utilize Conda for package management.

jakobjn avatar May 03 '24 09:05 jakobjn

Thanks for the report on this. I merged it now. Please let us know if you still encounter issues.

c-mertes avatar May 10 '24 22:05 c-mertes