Using binning causes annotations to fail
I was using the default binning (bin.size=10) in LoadTrackFile with BAM file inputs, which caused the coordinates to be misannotated and led to the following error when adding a geom_transcript layer:
Error in if (nrow(data)) {: argument is of length zero
Traceback:
1. add_ggplot(e1, e2, e2name)
2. ggplot_add(object, p, objectname)
3. ggplot_add.transcript(object, p, objectname)
4. geom_arrows(gene.tx.df.exon, color.by, exon.size, arrow.length,
. arrow.angle, arrow.type)
5. .handleSimpleError(function (cnd)
. {
. watcher$capture_plot_and_output()
. cnd <- sanitize_call(cnd)
. watcher$push(cnd)
. switch(on_error, continue = invokeRestart("eval_continue"),
. stop = invokeRestart("eval_stop"), error = invokeRestart("eval_error",
. cnd))
. }, "argument is of length zero", base::quote(if (nrow(data)) {
. if (!"strand" %in% colnames(data)) {
. data$strand <- "+"
. }
. geom_segment(inherit.aes = TRUE, data = data, mapping = aes(x = .data[["start"]],
. y = .data[["group"]], xend = .data[["end"]], yend = .data[["group"]],
. color = .data[[color]]), arrow = arrow(ends = ifelse(data$strand ==
. "+", "last", "first"), angle = arrow_angle, length = unit(arrow_size,
. "points"), type = arrow_type), lineend = "butt", linejoin = "mitre",
. show.legend = FALSE, linewidth = line_width)
Setting bin.size=NULL fixed the issue.
Hi Ian, can you provide a reproducible example?
Hi @m-jahn let me get a test dataset together from reads from a CCLE cell line that I can share.
Hi @m-jahn , see attached. I included a BAM intersected with EML4 and ALK exons as well as a filtered GTF. You can test with EML4 region "2:42396521-42488653"
I am using ggcoverage version 1.4.1.
thank you! Do you also happen to have the code chunk you were running that produced this error?
I just assumed the params you used for loading your bam file and I can already confirm the bug with the coordinates. Also strand information seems to get missing. To be honest the LoadTrack function is quite cluttered and would need a major rework.
For reproducibility, this is what I tested with your file:
bam <- "~/Downloads/ggcoverage_test_data/SNU-324_EML4_ALK.intersect.bam"
gtf <- "~/Downloads/ggcoverage_test_data/EML4_ALK.filt.gtf"
meta_data <- data.frame(
SampleName = "SNU-324_EML4_ALK.intersect",
Type = "EML4",
Group = "EML4"
)
track_df <- LoadTrackFile(
track.file = bam,
format = "bam",
region = "2:42396521-42488653",
norm.method = "None",
bin.size = 10,
meta.info = meta_data
)
head(track_df)
With the result:
seqnames start end width strand score Type Group
1 2 1 10 10 * 0.000000 EML4 EML4
2 2 1961 1970 10 * 4.333333 EML4 EML4
3 2 1971 1980 10 * 10.500000 EML4 EML4
4 2 1981 1990 10 * 16.000000 EML4 EML4
5 2 1991 2000 10 * 30.333333 EML4 EML4
6 2 2001 2010 10 * 55.666667 EML4 EML4
I'll try to fix some of this. In the mean time I recommend you use the much simpler rtracklayer interface to parse bam files: You can select your chromosome, calculate coverage and turn the result into a data.frame for plotting with ggcoverage.
rtracklayer::import(bam, format = "bam") %>%
coverage()
dear @ijhoskins
you can use the the latest version from this development branch https://github.com/m-jahn/ggcoverage/tree/bug_fixes
and see if it solves your issue. I have tested various fixes with your data and tried to make some functions more robust by adding educated guesses. Some defaults are hard-coded to examples that have no relevance for real users, e.g. the gene.name in geom_transcript.
With your example data I get now:
library(ggcoverage)
bam <- "/home/michael/Downloads/ggcoverage_test_data/SNU-324_EML4_ALK.intersect.bam"
gtf <- "/home/michael/Downloads/ggcoverage_test_data/EML4_ALK.filt.gtf"
meta_data <- data.frame(
SampleName = "SNU-324_EML4_ALK.intersect",
Type = "EML4",
Group = "EML4"
)
# load bam file
track_df <- LoadTrackFile(
track.file = bam,
format = "bam",
norm.method = "None",
bin.size = NULL,
meta.info = meta_data
)
# load genome annotation
gtf_gr <- rtracklayer::import.gff(con = gtf, format = "gtf")
# plot
ggcoverage(
data = track_df,
plot.type = "facet",
range.position = "out",
facet.key = "strand"
) +
geom_gene(gtf.gr = gtf_gr, plot.height = 0.2) +
geom_transcript(gtf.gr = gtf_gr, plot.height = 0.2)
Note that the gene plot is not fully correct because there is no gene annotated in your GTF file. But it still plots with a warning instead of throwing a cryptic error message.