ggcoverage icon indicating copy to clipboard operation
ggcoverage copied to clipboard

Using binning causes annotations to fail

Open ijhoskins opened this issue 11 months ago • 6 comments

I was using the default binning (bin.size=10) in LoadTrackFile with BAM file inputs, which caused the coordinates to be misannotated and led to the following error when adding a geom_transcript layer:

Error in if (nrow(data)) {: argument is of length zero
Traceback:

1. add_ggplot(e1, e2, e2name)
2. ggplot_add(object, p, objectname)
3. ggplot_add.transcript(object, p, objectname)
4. geom_arrows(gene.tx.df.exon, color.by, exon.size, arrow.length, 
 .     arrow.angle, arrow.type)
5. .handleSimpleError(function (cnd) 
 . {
 .     watcher$capture_plot_and_output()
 .     cnd <- sanitize_call(cnd)
 .     watcher$push(cnd)
 .     switch(on_error, continue = invokeRestart("eval_continue"), 
 .         stop = invokeRestart("eval_stop"), error = invokeRestart("eval_error", 
 .             cnd))
 . }, "argument is of length zero", base::quote(if (nrow(data)) {
 .     if (!"strand" %in% colnames(data)) {
 .         data$strand <- "+"
 .     }
 .     geom_segment(inherit.aes = TRUE, data = data, mapping = aes(x = .data[["start"]], 
 .         y = .data[["group"]], xend = .data[["end"]], yend = .data[["group"]], 
 .         color = .data[[color]]), arrow = arrow(ends = ifelse(data$strand == 
 .         "+", "last", "first"), angle = arrow_angle, length = unit(arrow_size, 
 .         "points"), type = arrow_type), lineend = "butt", linejoin = "mitre", 
 .         show.legend = FALSE, linewidth = line_width)

Setting bin.size=NULL fixed the issue.

ijhoskins avatar Jan 22 '25 03:01 ijhoskins

Hi Ian, can you provide a reproducible example?

m-jahn avatar Jan 22 '25 08:01 m-jahn

Hi @m-jahn let me get a test dataset together from reads from a CCLE cell line that I can share.

ijhoskins avatar Jan 22 '25 19:01 ijhoskins

Hi @m-jahn , see attached. I included a BAM intersected with EML4 and ALK exons as well as a filtered GTF. You can test with EML4 region "2:42396521-42488653"

I am using ggcoverage version 1.4.1.

ggcoverage_test_data.tar.gz

ijhoskins avatar Jan 22 '25 21:01 ijhoskins

thank you! Do you also happen to have the code chunk you were running that produced this error?

m-jahn avatar Jan 23 '25 13:01 m-jahn

I just assumed the params you used for loading your bam file and I can already confirm the bug with the coordinates. Also strand information seems to get missing. To be honest the LoadTrack function is quite cluttered and would need a major rework.

For reproducibility, this is what I tested with your file:

bam <- "~/Downloads/ggcoverage_test_data/SNU-324_EML4_ALK.intersect.bam"
gtf <- "~/Downloads/ggcoverage_test_data/EML4_ALK.filt.gtf"

meta_data <- data.frame(
  SampleName = "SNU-324_EML4_ALK.intersect",
  Type = "EML4",
  Group = "EML4"
)

track_df <- LoadTrackFile(
  track.file = bam,
  format = "bam",
  region = "2:42396521-42488653",
  norm.method = "None",
  bin.size = 10,
  meta.info = meta_data
)

head(track_df)

With the result:

  seqnames start  end width strand     score Type Group
1        2     1   10    10      *  0.000000 EML4  EML4
2        2  1961 1970    10      *  4.333333 EML4  EML4
3        2  1971 1980    10      * 10.500000 EML4  EML4
4        2  1981 1990    10      * 16.000000 EML4  EML4
5        2  1991 2000    10      * 30.333333 EML4  EML4
6        2  2001 2010    10      * 55.666667 EML4  EML4

I'll try to fix some of this. In the mean time I recommend you use the much simpler rtracklayer interface to parse bam files: You can select your chromosome, calculate coverage and turn the result into a data.frame for plotting with ggcoverage.

rtracklayer::import(bam, format = "bam") %>% 
  coverage()

m-jahn avatar Jan 23 '25 14:01 m-jahn

dear @ijhoskins you can use the the latest version from this development branch https://github.com/m-jahn/ggcoverage/tree/bug_fixes and see if it solves your issue. I have tested various fixes with your data and tried to make some functions more robust by adding educated guesses. Some defaults are hard-coded to examples that have no relevance for real users, e.g. the gene.name in geom_transcript.

With your example data I get now:

library(ggcoverage)

bam <- "/home/michael/Downloads/ggcoverage_test_data/SNU-324_EML4_ALK.intersect.bam"
gtf <- "/home/michael/Downloads/ggcoverage_test_data/EML4_ALK.filt.gtf"

meta_data <- data.frame(
  SampleName = "SNU-324_EML4_ALK.intersect",
  Type = "EML4",
  Group = "EML4"
)

# load bam file
track_df <- LoadTrackFile(
  track.file = bam,
  format = "bam",
  norm.method = "None",
  bin.size = NULL,
  meta.info = meta_data
)

# load genome annotation
gtf_gr <- rtracklayer::import.gff(con = gtf, format = "gtf")

# plot
ggcoverage(
  data = track_df,
  plot.type = "facet",
  range.position = "out",
  facet.key = "strand"
) +
  geom_gene(gtf.gr = gtf_gr, plot.height = 0.2) +
  geom_transcript(gtf.gr = gtf_gr, plot.height = 0.2)

Image

Note that the gene plot is not fully correct because there is no gene annotated in your GTF file. But it still plots with a warning instead of throwing a cryptic error message.

m-jahn avatar Jan 31 '25 16:01 m-jahn