gggenes icon indicating copy to clipboard operation
gggenes copied to clipboard

put gggenes x-axes on common scale

Open svedwards opened this issue 2 years ago • 8 comments

Hi @wilkox,

Thanks for a very nice package. Really helpful. However, I'm having trouble putting all my genomes on a common numerical axis starting from the gene on which they are centered (BRD2, in green). You can see from the plot image that I have 11 genomes aligned on BRD2 but they all start at different points. I could renumber all the genes in the data.frame, and I have tried all the "breaks" and "labels" options, but I often get the error "Error: Breaks and labels are different lengths". So I must be missing something. I have attached the image and the dataframe (=object "reversetab5" in code). The code I used was:

reversetab5<-read.table(file="11_hap_5_gene_tab2_genes_only.txt"

dummies <- make_alignment_dummies( reversetab5, aes(xmin = negstart03, xmax = negend03, y = molecule, id = gene), on = "BRD2" ) p<-ggplot(reversetab5,aes(xmin = negstart03, xmax = negend03, y = molecule, fill = gene, label = gene)) + geom_gene_arrow(arrowhead_height = unit(3, "mm"), arrowhead_width = unit(1, "mm"),colour="blue",size=0) + geom_blank(data = dummies) + facet_wrap(~ molecule, scales = "free", ncol = 1) + scale_fill_brewer(palette = "Set3") + scale_x_continuous(labels = scales::comma) + theme_genes() + theme(axis.text = element_text(size = 10))

ggsave("TEs_segment9.pdf",p, height = 20, width = 10,limitsize = FALSE)

11_hap_5_gene_tab2_genes_only.txt

TEs_segment9.pdf

svedwards avatar Dec 29 '21 17:12 svedwards

Hi Scott, thanks for your kind words about gggenes. I hope I have correctly understood what you are trying to achieve. In the code below I capture the range of gene locations in the variables genes_min and genes_max, then use the limits argument to scale_x_continuous() to set the range of the x-axis for each molecule to these values.

library(ggplot2)
library(gggenes)

reversetab5 <- read.table(file = "11_hap_5_gene_tab2_genes_only.txt", header = TRUE)

genes_min <- min(c(reversetab5$negend03, reversetab5$negend03))
genes_max <- max(c(reversetab5$negend03, reversetab5$negend03))

ggplot(reversetab5, aes(xmin = negstart03, xmax = negend03, y = molecule, 
                        fill = gene, label = gene)) +
  geom_gene_arrow(arrowhead_height = unit(3, "mm"), arrowhead_width = unit(1, "mm"), 
                  colour = "blue", size = 0) +
  facet_wrap(~ molecule, scales = "free", ncol = 1) +
  scale_fill_brewer(palette = "Set3") + 
  scale_x_continuous(labels = scales::comma, limits = c(genes_min, genes_max)) + 
  theme_genes() %+replace% theme(axis.text = element_text(size = 10))

TEs

wilkox avatar Dec 29 '21 23:12 wilkox

Thanks very much, David, this is really helpful. And before I forget, happy new year! This will help me put all the scaffolds on a common numerical scale. What I'd really like to do is have this common scale and have the scaffolds aligned at the BRD2 gene. So it basically requires re-numbering or re-scaling some of the scaffolds. So, basically combining the numbering that you generated (ideally, with 0 at the left, and going positive to right) with the image I uploaded (with BRD2 aligned at left). 0 can be right at BRD2 or nearby, doesn't have to be exact. Let me know if that makes sense. - Scott

svedwards avatar Jan 01 '22 03:01 svedwards

Happy New Year!

The example below 'zeros out' all the gene locations by subtracting the minimum location in each molecule (which in every case is the start of BRD2) from all the location values. It then uses scale_x_continuous() to manually fix the x-axis range between 0 and 420,000. I have loaded the tidyverse to do this as it greatly simplifies the 'zeroing out' calculations but it would also be possible to do this in base R.

# Load libraries
library(tidyverse)
library(gggenes)

# Load the gene data
reversetab5 <- read_delim("11_hap_5_gene_tab2_genes_only.txt")

# For each molecule, zero the gene locations to the beginning of the first
# gene (which in all cases is BRD2)
reversetab5 <- reversetab5 %>%
  group_by(molecule) %>%
  mutate(genes_min = min(c(negstart03, negend03))) %>%
  mutate(negstart03 = negstart03 - genes_min) %>%
  mutate(negend03 = negend03 - genes_min) %>%
  ungroup()

# Draw plot
ggplot(reversetab5, aes(xmin = negstart03, xmax = negend03, y = molecule, 
                        fill = gene, label = gene)) +
  geom_gene_arrow(arrowhead_height = unit(3, "mm"), arrowhead_width = unit(1, "mm"), 
                  colour = "blue", size = 0) +
  facet_wrap(~ molecule, scales = "free", ncol = 1) +
  scale_fill_brewer(palette = "Set3") + 
  scale_x_continuous(labels = scales::comma, limits = c(0, 420000)) + 
  theme_genes() %+replace% theme(axis.text = element_text(size = 10))

TEs

wilkox avatar Jan 03 '22 04:01 wilkox

Wow, David, thanks so much. I have really learned alot in this thread and I appreciate your taking the time to help me get around this snag. The result looks beautiful! Hopefully it will be helpful to others as well! Best for now and I enjoyed reading about your microbial research (especially the ecology/biogeography stuff)

Scott

svedwards avatar Jan 03 '22 23:01 svedwards

I've been taking a similar approach when using gggenes with my own data. (And thanks for the excellent package, by the way.) But I've been wondering - when one does this, is it possible to provide only a single scale (x-axis)? When all the genomic regions are on the same scale, the duplicate scales just add clutter (particularly if there are a large number of regions being visualized).

If it's challenging due to the way gggenes uses ggplot2, I suppose one could hide the axes and make a dummy genomic region with tick-marks made of 1-nt "genes" every kb or something? But I'm hoping there's a more elegant answer.

g-e-kenney avatar Jan 09 '22 02:01 g-e-kenney

@g-e-kenney If all the molecules are on the same scale as in the example I posted above, the plot could just be drawn without faceting (i.e. just delete the facet_wrap() line). ggplot will draw this with a single x-axis scale at the bottom.

wilkox avatar Jan 13 '22 10:01 wilkox

Rather belatedly: hmm, that actually doesn't quite get at what I was hoping for - it brings me down to a single scale, but without the facet_wrap() line, all the genome regions are independently aligned to the (artificial) 0 kb starting point, rather than aligned to the gene of interest. Essentially, I'd like to be able to maintain alignment to a gene of interest (via the dummies function) BUT have some way to generate a single scale bar (with, say, 1 kb tick marks or something). This is important only for relative scaling ("this length = 1 kb") - like the scale bar on the clinker example.

g-e-kenney avatar Feb 19 '22 02:02 g-e-kenney

@g-e-kenney you can put together something kind of similar with patchwork and ggfittext:

library(ggplot2)
library(gggenes)
library(ggfittext)
library(patchwork)

dummies <- make_alignment_dummies(
  example_genes,
  aes(xmin = start, xmax = end, y = molecule, id = gene),
  on = "genE"
)

genomes_plot <- ggplot(example_genes, aes(xmin = start, xmax = end, y = molecule, fill = gene)) +
  geom_gene_arrow() +
  geom_blank(data = dummies) +
  facet_wrap(~ molecule, scales = "free", ncol = 1) +
  scale_fill_brewer(palette = "Set3") +
  theme_genes() %+replace% theme(
    axis.line.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.x = element_blank()
  )

extent <- dummies$end[1] - dummies$start[1]
scale_bar <- data.frame(start = 0, end = 1000, label = "1 kbp")

scale_bar_plot <- ggplot(scale_bar, aes(xmin = start, xmax = end, label = label, y = 1)) +
  geom_errorbarh() +
  geom_fit_text(grow = TRUE, place = "top") +
  scale_x_continuous(limits = c(0, extent)) +
  theme_genes() %+replace% theme(
    axis.line.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.title = element_blank(),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
  )

genomes_plot / scale_bar_plot + plot_layout(heights = c(9, 1))

Created on 2022-03-13 by the reprex package (v2.0.1)

wilkox avatar Mar 13 '22 05:03 wilkox