gggenes
gggenes copied to clipboard
Calculate the direction of data obtained from gff file
How could I plot a group of genes in the same directions using gggenes and ggplot2 ?
I have a data obtained from a .gff file, it was generated with prokka, then it was imported with gggenomes::read_feats function, and get the genes that I want to plot, it generate the next data:
df
# A tibble: 12 × 11
file_id seq_id start end strand feat_id locus_tag orientation gene cluster position
<chr> <chr> <int> <int> <chr> <chr> <chr> <int> <chr> <chr> <int>
1 Genome-300 Genome-300_1 199743 201911 - IMEHDJCA_00189 IMEHDJCA_00189 -1 geneE cluster_00365 200827
2 Genome-300 Genome-300_1 201914 203275 - IMEHDJCA_00190 IMEHDJCA_00190 -1 geneD cluster_01313 202594
3 Genome-300 Genome-300_1 203272 205377 - IMEHDJCA_00191 IMEHDJCA_00191 -1 geneB cluster_00403 204324
4 Genome-300 Genome-300_1 205817 206176 + IMEHDJCA_00192 IMEHDJCA_00192 1 gene1033 cluster_06099 205996
5 Genome-300 Genome-300_1 206268 206663 + IMEHDJCA_00193 IMEHDJCA_00193 1 geneC cluster_05858 206465
6 Genome-300 Genome-300_1 206686 222306 + IMEHDJCA_00194 IMEHDJCA_00194 1 geneA cluster_00001 214496
7 Genome-320 Genome-320_8 123699 139310 - ILCJGNBA_01570 ILCJGNBA_01570 -1 geneA cluster_00001 131504
8 Genome-320 Genome-320_8 139333 139728 - ILCJGNBA_01571 ILCJGNBA_01571 -1 geneC cluster_05858 139530
9 Genome-320 Genome-320_8 139820 140179 - ILCJGNBA_01572 ILCJGNBA_01572 -1 gene1033 cluster_06099 139999
10 Genome-320 Genome-320_8 140619 142724 + ILCJGNBA_01573 ILCJGNBA_01573 1 geneB cluster_00403 141671
11 Genome-320 Genome-320_8 142721 144082 + ILCJGNBA_01574 ILCJGNBA_01574 1 geneD cluster_01322 143401
12 Genome-320 Genome-320_8 144085 146253 + ILCJGNBA_01575 ILCJGNBA_01575 1 geneE cluster_00365 145169
plot
ggplot(df, aes(xmin = start, xmax = end, y = file_id, fill = gene)) +
geom_gene_arrow(aes(forward = orientation), arrow_body_height = unit(7, "mm"),
arrowhead_height = unit(7, "mm"),
arrowhead_width = unit(5.7, "mm")) +
scale_fill_manual(values = c("orange", "blue", "red", "green", "yellow", "#878787")) +
facet_wrap(~ file_id, scales = "free", ncol = 1) +
scale_y_discrete(position="right") +
gggenes::theme_genes() +
theme(panel.background = element_rect(fill = 'white' , color = 'white' ),
panel.grid.major.y = ggplot2::element_line(colour = "grey", linewidth = 0.6),
axis.title.y= element_blank(),
axis.text.y = element_text(size = 14, # sample name size
family="Times New Roman",
face="bold"),
plot.margin = unit(c(t=0.3, b=0.3, r=0.3, l=0.001), "mm")
)
When I generate the plot, the genes of each genome are in different direction, is any way to recalculate the position for genes in Genome-320 and be in the same direction than Genome-300 ??
any suggestion ? thanks
Hi @cabraham03, do you want to change the direction of individual genes (this is controlled by the forward
aesthetic), or reverse the direction of the entire locus shown in Genome-320 so that it resembles Genome-300?
Hello. not for a single gene, all the genes in the second plot (Genome-320), I found this solution, but I'm not sure: if I have a full length of 150 bp (just an example), and I have 2 genes, geneA start at 10 (start1) and end at 50 bp (end1), and geneB start at 80 (start1) and end at 130 bp (end1); so, to calculate the genes direction will be: start2 = (total length - end1) +1 end2 = (total length - start1) + 1
so geneA start2= (150-50) + 1=101 end2 = (150-10)+1 = 141
geneB start2 = (150 - 130) + 1 = 21 end2 = (150 - 80) + 1 = 71
is it ok ? thanks !!!
I think the best way to do it would be to draw the genomes separately with the x-axis reversed for Genome-320
, then composite the plots with patchwork. This way you avoid showing incorrect coordinates for the genes.
# Libraries
library(tidyverse)
library(gggenes)
library(patchwork)
# Data
df <- tribble(
~file_id, ~seq_id, ~start, ~end, ~strand, ~feat_id, ~locus_tag, ~orientation, ~gene, ~cluster, ~position,
"Genome-300", "Genome-300_1", 199743, 201911, "-", "IMEHDJCA_00189", "IMEHDJCA_00189", -1, "geneE", "cluster_00365", 200827,
"Genome-300", "Genome-300_1", 201914, 203275, "-", "IMEHDJCA_00190", "IMEHDJCA_00190", -1, "geneD", "cluster_01313", 202594,
"Genome-300", "Genome-300_1", 203272, 205377, "-", "IMEHDJCA_00191", "IMEHDJCA_00191", -1, "geneB", "cluster_00403", 204324,
"Genome-300", "Genome-300_1", 205817, 206176, "+", "IMEHDJCA_00192", "IMEHDJCA_00192", 1, "gene1033", "cluster_06099", 205996,
"Genome-300", "Genome-300_1", 206268, 206663, "+", "IMEHDJCA_00193", "IMEHDJCA_00193", 1, "geneC", "cluster_05858", 206465,
"Genome-300", "Genome-300_1", 206686, 222306, "+", "IMEHDJCA_00194", "IMEHDJCA_00194", 1, "geneA", "cluster_00001", 214496,
"Genome-320", "Genome-320_8", 123699, 139310, "-", "ILCJGNBA_01570", "ILCJGNBA_01570", -1, "geneA", "cluster_00001", 131504,
"Genome-320", "Genome-320_8", 139333, 139728, "-", "ILCJGNBA_01571", "ILCJGNBA_01571", -1, "geneC", "cluster_05858", 139530,
"Genome-320", "Genome-320_8", 139820, 140179, "-", "ILCJGNBA_01572", "ILCJGNBA_01572", -1, "gene1033", "cluster_06099", 139999,
"Genome-320", "Genome-320_8", 140619, 142724, "+", "ILCJGNBA_01573", "ILCJGNBA_01573", 1, "geneB", "cluster_00403", 141671,
"Genome-320", "Genome-320_8", 142721, 144082, "+", "ILCJGNBA_01574", "ILCJGNBA_01574", 1, "geneD", "cluster_01322", 143401,
"Genome-320", "Genome-320_8", 144085, 146253, "+", "ILCJGNBA_01575", "ILCJGNBA_01575", 1, "geneE", "cluster_00365", 145169
)
# Convert orientation to logical
df$orientation <- ifelse(df$orientation == 1, TRUE, FALSE)
# Plot
base <- ggplot(df, aes(xmin = start, xmax = end, y = file_id, fill = gene)) +
scale_fill_manual(values = c("orange", "blue", "red", "green", "yellow", "#878787")) +
scale_y_discrete(position = "right") +
gggenes::theme_genes() +
theme(panel.background = element_rect(fill = 'white' , color = 'white' ),
panel.grid.major.y = ggplot2::element_line(colour = "grey", linewidth = 0.6),
axis.title.y= element_blank(),
axis.text.y = element_text(size = 14, # sample name size
family="Times New Roman",
face="bold")
)
genome300 <- base + geom_gene_arrow(
data = df[which(df$file_id == "Genome-300"), ],
aes(forward = orientation),
arrow_body_height = unit(7, "mm"),
arrowhead_height = unit(7, "mm"),
arrowhead_width = unit(5.7, "mm")
)
genome320 <- base + geom_gene_arrow(
data = df[which(df$file_id == "Genome-320"), ],
aes(forward = orientation),
arrow_body_height = unit(7, "mm"),
arrowhead_height = unit(7, "mm"),
arrowhead_width = unit(5.7, "mm")
) + scale_x_continuous(transform = "reverse")
(genome300 / genome320) + plot_layout(guides = "collect")
Created on 2024-05-28 with reprex v2.1.0
Thanks So Much !!!