gggenomes
gggenomes copied to clipboard
flip() function for genes containing introns
Hi again,
I faced the problem of flip function against genes containing intron. This is from public database.
tmp_gg <- read_gff3("tmp.gff3.txt") %>%
rename_seq() %>%
unchop(parent_ids) %>%
gggenomes() +
geom_gene(aes(fill=type, group=parent_ids), size = 5) +
scale_fill_brewer(palette = "Set1")
tmp_gg
tmp_gg %>% flip_seqs(4)
The exon does not reverse correctly.
rename_seq function is the following self-made function. It is essentially run to recognize genes that are on the same genome as different bins. It is probably specific to this data and not a common function.
rename_seq <- function(gff_data, gene_list = NA){
if(!is.na(gene_list)){
renamed_genes <- gff_data %>%
dplyr::filter(parent_ids %in% gene_list |
feat_id %in% gene_list)
} else {
renamed_genes <- gff_data %>%
arrange(seq_id)
}
renamed_gff <- renamed_genes %>%
mutate(seq_ids_psuedo = ifelse(grepl(".m1$",parent_ids),
str_replace(parent_ids,
pattern=".m1$", replacement=""),
parent_ids)) %>%
mutate(seq_id = seq_ids_psuedo) %>%
unchop(seq_id)
return(renamed_gff)
}
I guess it may be related to #65, but I don't know the details. Thanks in advance.
Keigo
Yep, that is definitely off! Right now, there is good and bad news.
Good news, I'm testing a fix for #65, which seems to work and should make your life easier (will upload soon).
# read_gff3 now can handle Augustus CDS
g0 <- read_gff3("issue_91/tmp.gff3.txt", fix_augustus_cds = TRUE)
# rename_seq() a bit more concise ;)
g1 <- g0 %>% mutate(seq_id = str_remove(map(parent_ids, 1), ".m1$"))
gggenomes(g1) +
geom_gene(aes(fill=type), size = 5) +
scale_fill_brewer(palette = "Set1")
Bad news, fixing flip()
needs some more thought. The problem are the exons. All other features can simply be flipped by toggling their strand
. This is also true for CDS because I merge all parts from one CDS into one feature internally. However, exons stay independent features, so instead of flipping all exons of one gene together (like all CDS parts), each exon is flipped on its own, which doesn't give the intended outcome. Need to ponder...
Wow, thanks a lot.
Solving the Augustus problem is definitely great for me. Thanks for telling me about the str_remove function too!
OK, it’s seem to be difficult to take exon into account...
Hope the exon flipping issue got fixed someday. That will be really useful to illustrate the introns of eukaryotic genomes.