gggenomes icon indicating copy to clipboard operation
gggenomes copied to clipboard

flip() function for genes containing introns

Open xvtyzn opened this issue 2 years ago • 3 comments

Hi again,

I faced the problem of flip function against genes containing intron. This is from public database.

tmp_gg <- read_gff3("tmp.gff3.txt") %>%
  rename_seq() %>%
  unchop(parent_ids) %>%
  gggenomes() + 
  geom_gene(aes(fill=type, group=parent_ids), size = 5)  +
  scale_fill_brewer(palette = "Set1")

tmp_gg

cd551f25-2564-417c-a067-83cbd0f70345

tmp_gg %>% flip_seqs(4)

cc495749-aaa6-4b36-9582-cb7dfd2af960

The exon does not reverse correctly.

rename_seq function is the following self-made function. It is essentially run to recognize genes that are on the same genome as different bins. It is probably specific to this data and not a common function.

rename_seq <- function(gff_data, gene_list = NA){
  if(!is.na(gene_list)){
    renamed_genes <- gff_data %>%
      dplyr::filter(parent_ids %in% gene_list | 
                      feat_id %in% gene_list) 
  } else {
    renamed_genes <- gff_data %>%
      arrange(seq_id) 
  }
  
  renamed_gff <- renamed_genes %>%
    mutate(seq_ids_psuedo = ifelse(grepl(".m1$",parent_ids),
                                 str_replace(parent_ids, 
                                             pattern=".m1$", replacement=""), 
                                 parent_ids)) %>%
    mutate(seq_id = seq_ids_psuedo) %>%
    unchop(seq_id)
  return(renamed_gff)
}

I guess it may be related to #65, but I don't know the details. Thanks in advance.

Keigo

tmp.gff3.txt

xvtyzn avatar Oct 15 '21 12:10 xvtyzn

Yep, that is definitely off! Right now, there is good and bad news.

Good news, I'm testing a fix for #65, which seems to work and should make your life easier (will upload soon).

# read_gff3 now can handle Augustus CDS
g0 <- read_gff3("issue_91/tmp.gff3.txt", fix_augustus_cds = TRUE)
# rename_seq() a bit more concise ;)
g1 <- g0 %>% mutate(seq_id = str_remove(map(parent_ids, 1), ".m1$"))

gggenomes(g1) + 
  geom_gene(aes(fill=type), size = 5)  +
  scale_fill_brewer(palette = "Set1")

image

Bad news, fixing flip() needs some more thought. The problem are the exons. All other features can simply be flipped by toggling their strand. This is also true for CDS because I merge all parts from one CDS into one feature internally. However, exons stay independent features, so instead of flipping all exons of one gene together (like all CDS parts), each exon is flipped on its own, which doesn't give the intended outcome. Need to ponder...

thackl avatar Oct 15 '21 19:10 thackl

Wow, thanks a lot.

Solving the Augustus problem is definitely great for me. Thanks for telling me about the str_remove function too!

OK, it’s seem to be difficult to take exon into account...

xvtyzn avatar Oct 16 '21 01:10 xvtyzn

Hope the exon flipping issue got fixed someday. That will be really useful to illustrate the introns of eukaryotic genomes.

biubiu-1 avatar Nov 09 '23 14:11 biubiu-1