gggenes icon indicating copy to clipboard operation
gggenes copied to clipboard

Calculate the direction of data obtained from gff file

Open cabraham03 opened this issue 1 month ago • 1 comments

How could I plot a group of genes in the same directions using gggenes and ggplot2 ?

I have a data obtained from a .gff file, it was generated with prokka, then it was imported with gggenomes::read_feats function, and get the genes that I want to plot, it generate the next data:

df
# A tibble: 12 × 11
   file_id    seq_id        start    end strand feat_id        locus_tag      orientation gene     cluster       position
   <chr>      <chr>         <int>  <int> <chr>  <chr>          <chr>                <int> <chr>    <chr>            <int>
 1 Genome-300 Genome-300_1 199743 201911 -      IMEHDJCA_00189 IMEHDJCA_00189          -1 geneE    cluster_00365   200827
 2 Genome-300 Genome-300_1 201914 203275 -      IMEHDJCA_00190 IMEHDJCA_00190          -1 geneD    cluster_01313   202594
 3 Genome-300 Genome-300_1 203272 205377 -      IMEHDJCA_00191 IMEHDJCA_00191          -1 geneB    cluster_00403   204324
 4 Genome-300 Genome-300_1 205817 206176 +      IMEHDJCA_00192 IMEHDJCA_00192           1 gene1033 cluster_06099   205996
 5 Genome-300 Genome-300_1 206268 206663 +      IMEHDJCA_00193 IMEHDJCA_00193           1 geneC    cluster_05858   206465
 6 Genome-300 Genome-300_1 206686 222306 +      IMEHDJCA_00194 IMEHDJCA_00194           1 geneA    cluster_00001   214496
 7 Genome-320 Genome-320_8 123699 139310 -      ILCJGNBA_01570 ILCJGNBA_01570          -1 geneA    cluster_00001   131504
 8 Genome-320 Genome-320_8 139333 139728 -      ILCJGNBA_01571 ILCJGNBA_01571          -1 geneC    cluster_05858   139530
 9 Genome-320 Genome-320_8 139820 140179 -      ILCJGNBA_01572 ILCJGNBA_01572          -1 gene1033 cluster_06099   139999
10 Genome-320 Genome-320_8 140619 142724 +      ILCJGNBA_01573 ILCJGNBA_01573           1 geneB    cluster_00403   141671
11 Genome-320 Genome-320_8 142721 144082 +      ILCJGNBA_01574 ILCJGNBA_01574           1 geneD    cluster_01322   143401
12 Genome-320 Genome-320_8 144085 146253 +      ILCJGNBA_01575 ILCJGNBA_01575           1 geneE    cluster_00365   145169

plot

ggplot(df, aes(xmin = start, xmax = end, y = file_id, fill = gene)) +
    geom_gene_arrow(aes(forward = orientation), arrow_body_height = unit(7, "mm"), 
                    arrowhead_height = unit(7, "mm"), 
                    arrowhead_width = unit(5.7, "mm")) + 
    scale_fill_manual(values = c("orange", "blue", "red",  "green", "yellow", "#878787")) +
    facet_wrap(~ file_id, scales = "free", ncol = 1) +
    scale_y_discrete(position="right") +
    gggenes::theme_genes() +
    theme(panel.background = element_rect(fill = 'white' , color = 'white' ),
          panel.grid.major.y = ggplot2::element_line(colour = "grey", linewidth = 0.6),
          axis.title.y= element_blank(), 
          axis.text.y = element_text(size = 14, # sample name size
                                     family="Times New Roman", 
                                     face="bold"),
          plot.margin = unit(c(t=0.3, b=0.3, r=0.3, l=0.001), "mm") 
    )

genes2

When I generate the plot, the genes of each genome are in different direction, is any way to recalculate the position for genes in Genome-320 and be in the same direction than Genome-300 ??

any suggestion ? thanks

cabraham03 avatar May 17 '24 16:05 cabraham03

Hi @cabraham03, do you want to change the direction of individual genes (this is controlled by the forward aesthetic), or reverse the direction of the entire locus shown in Genome-320 so that it resembles Genome-300?

wilkox avatar May 20 '24 23:05 wilkox

Hello. not for a single gene, all the genes in the second plot (Genome-320), I found this solution, but I'm not sure: if I have a full length of 150 bp (just an example), and I have 2 genes, geneA start at 10 (start1) and end at 50 bp (end1), and geneB start at 80 (start1) and end at 130 bp (end1); so, to calculate the genes direction will be: start2 = (total length - end1) +1 end2 = (total length - start1) + 1

so geneA start2= (150-50) + 1=101 end2 = (150-10)+1 = 141

geneB start2 = (150 - 130) + 1 = 21 end2 = (150 - 80) + 1 = 71

is it ok ? thanks !!!

cabraham03 avatar May 27 '24 16:05 cabraham03

I think the best way to do it would be to draw the genomes separately with the x-axis reversed for Genome-320, then composite the plots with patchwork. This way you avoid showing incorrect coordinates for the genes.

# Libraries
library(tidyverse)
library(gggenes)
library(patchwork)

# Data
df <- tribble(
  ~file_id,     ~seq_id,        ~start, ~end,   ~strand, ~feat_id,         ~locus_tag,       ~orientation, ~gene,      ~cluster,        ~position,
  "Genome-300", "Genome-300_1", 199743, 201911, "-",     "IMEHDJCA_00189", "IMEHDJCA_00189", -1,           "geneE",    "cluster_00365", 200827,
  "Genome-300", "Genome-300_1", 201914, 203275, "-",     "IMEHDJCA_00190", "IMEHDJCA_00190", -1,           "geneD",    "cluster_01313", 202594,
  "Genome-300", "Genome-300_1", 203272, 205377, "-",     "IMEHDJCA_00191", "IMEHDJCA_00191", -1,           "geneB",    "cluster_00403", 204324,
  "Genome-300", "Genome-300_1", 205817, 206176, "+",     "IMEHDJCA_00192", "IMEHDJCA_00192",  1,           "gene1033", "cluster_06099", 205996,
  "Genome-300", "Genome-300_1", 206268, 206663, "+",     "IMEHDJCA_00193", "IMEHDJCA_00193",  1,           "geneC",    "cluster_05858", 206465,
  "Genome-300", "Genome-300_1", 206686, 222306, "+",     "IMEHDJCA_00194", "IMEHDJCA_00194",  1,           "geneA",    "cluster_00001", 214496,
  "Genome-320", "Genome-320_8", 123699, 139310, "-",     "ILCJGNBA_01570", "ILCJGNBA_01570", -1,           "geneA",    "cluster_00001", 131504,
  "Genome-320", "Genome-320_8", 139333, 139728, "-",     "ILCJGNBA_01571", "ILCJGNBA_01571", -1,           "geneC",    "cluster_05858", 139530,
  "Genome-320", "Genome-320_8", 139820, 140179, "-",     "ILCJGNBA_01572", "ILCJGNBA_01572", -1,           "gene1033", "cluster_06099", 139999,
  "Genome-320", "Genome-320_8", 140619, 142724, "+",     "ILCJGNBA_01573", "ILCJGNBA_01573",  1,           "geneB",    "cluster_00403", 141671,
  "Genome-320", "Genome-320_8", 142721, 144082, "+",     "ILCJGNBA_01574", "ILCJGNBA_01574",  1,           "geneD",    "cluster_01322", 143401,
  "Genome-320", "Genome-320_8", 144085, 146253, "+",     "ILCJGNBA_01575", "ILCJGNBA_01575",  1,           "geneE",    "cluster_00365", 145169
)

# Convert orientation to logical
df$orientation <- ifelse(df$orientation == 1, TRUE, FALSE)

# Plot
base <- ggplot(df, aes(xmin = start, xmax = end, y = file_id, fill = gene)) +
  scale_fill_manual(values = c("orange", "blue", "red",  "green", "yellow", "#878787")) +
  scale_y_discrete(position = "right") +
  gggenes::theme_genes() +
  theme(panel.background = element_rect(fill = 'white' , color = 'white' ),
        panel.grid.major.y = ggplot2::element_line(colour = "grey", linewidth = 0.6),
        axis.title.y= element_blank(), 
        axis.text.y = element_text(size = 14, # sample name size
                                   family="Times New Roman", 
                                   face="bold")
  )

genome300 <- base + geom_gene_arrow(
  data = df[which(df$file_id == "Genome-300"), ],
  aes(forward = orientation),
  arrow_body_height = unit(7, "mm"),
  arrowhead_height = unit(7, "mm"),
  arrowhead_width = unit(5.7, "mm")
)

genome320 <- base + geom_gene_arrow(
  data = df[which(df$file_id == "Genome-320"), ],
  aes(forward = orientation),
  arrow_body_height = unit(7, "mm"),
  arrowhead_height = unit(7, "mm"),
  arrowhead_width = unit(5.7, "mm")
) + scale_x_continuous(transform = "reverse")

(genome300 / genome320) + plot_layout(guides = "collect")

Created on 2024-05-28 with reprex v2.1.0

wilkox avatar May 28 '24 02:05 wilkox

Thanks So Much !!!

cabraham03 avatar May 28 '24 21:05 cabraham03