gggenomes
gggenomes copied to clipboard
Scale on ribbon
Hi, lately I have been playing around with the amazing gggenomes library. I'm interested in visualizing gene counts within non overlapping sliding windows. Similar to the GC content example in the documentation I wanted to plot the data in a ribbon above and below the chromosomes / genomic regions plus some extra ribbons representing alternative treatments.
Is there a way to show a scale next to the ribbons that I drew in this picture?
I know there are some overlaps that I still have to work out. The scale next to the ribbons would be one step closer to a perfect plot :) It would be amazing if someone could help me out with this!
Below is some example code to show how I generated the red ribbon plot. I added all the other ribbons in the same way adjusting their y value and their fill color.
seqs = fasta index file read in with read_fai() mapping = paf file generated with minimap2 read in with read_paf() gene_cluster = bed file with genes per window count read in with read_bed()
p1<-gggenomes(seqs = seqs, links = mapping)+
geom_seq()+
geom_link(aes(fill=blast, color=blast), alpha=0.8, offset = c(0.3, 0.3))+
p2<- p1 %>% add_feats(gene_cluster)+
new_scale("color")+
geom_ribbon(aes(x=(x+xend)/2,
ymax=y,
ymin=y-(0.03*score),
group=seq_id,
linetype="Genes"),
feats(gene_cluster),
fill="red",
linetype=1,
size=0.1,
color="black",
alpha=.5)
continuing like this for all the ribbons shown.
Cheers, Daniel
Fancy! Good question. What kind of scale do you mean? A complete axis, or a small scale bar with fixed size for comparison. Maybe akin to #85 geom_scale_bar but on the y-axis matching the ribbon scale?
On a side note, check out gggenomes::geom_wiggle()
which I've added a while back. It wraps geom_ribbon()
, so that you don't need to add all the y,ymax,x,xend, ...
.
Thanks a lot for the geom_wiggle() tip! Indeed makes it much easier to plot the ribbons. I tried the "line" and "linerange" option which gives you a legend for the plotted data. This is already pretty nice!
Below is the "line" version as I think the "linerange" is a bit confusing in a whole genome context. To elaborate the graph a bit: The upper ribbon is actual annotated genes of a gene family and the lower part is expressed genes of the same gene family from IsoSeq data.
However, the problem I encounter is that each ribbon gets its own legend scale. It should be a shared one among all the ribbons.
I guess an added dedicated axis for each ribbon would be the easiest for people to understand what they are looking at there.
Thank you for your help!
Cheers, Daniel
A little update to this. I had some time to play around with this and I got a bit further with the gggenomes builtin aesthetics:
- First I draw the bare genome
- Then I add the feature track plus a bar plot
- To this I add the axis information plus a continuous scale that takes the max from the added feature track.
p_chr <- gggenomes(seqs = genome)+
geom_seq()
p_chr %>% add_feats(cluster)+
geom_bar(aes(x=(x+xend)/2,
y = score),
feats(cluster),
fill="orange",
linetype=1,
size=0.1,
alpha=1,
stat = "identity")+
theme(axis.line.y.left = element_line(),
axis.ticks.y = element_line(),
axis.text.y = element_text(),
axis.line.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_blank())+
scale_y_continuous(limits = c(0, max(cluster$score)),breaks = seq(0,max(cluster$score), 2))
My question would be if there is a way to move the bare genome to 0. It seems to sit always at 1 at the Y-axis. I haven't tried to add this to a bigger gggenomes plot.
Here's a solution, although granted, not the prettiest ...
p1 <- gggenomes(emale_genes, emale_seqs) |> add_clusters(emale_cogs) +
geom_seq() + geom_gene()
# default plot with first genome at y==1
p1 + theme_bw()
p2 <- p1
# change y-coords of seqs by hand
p2 <- set_seqs(p2, get_seqs(p2) |> mutate(y=y-1))
# relayout other features (genes, etc)
# BUG: this should work but ignore_seqs not propaged
# p2 <- layout(p2, ignore_seqs=TRUE)
# this works
p2$data <- layout(p2$data, ignore_seqs=TRUE)
# plot with genomes shifted on y axis, i.e. first genomes at y==0
p2 + theme_bw()