BRAKER
BRAKER copied to clipboard
Overlapping genes in gtf
Hi,
I was looking in my BRAKER output and I saw some strange things pop up. In the annotation of the genome there are genes which overlap in the output. Here is an example:
ptg000001l GUSHR gene 4658185 4661128 . - . gene_id "g40370";
ptg000001l AUGUSTUS CDS 4658374 4658736 0.83 - 0 transcript_id "g40370.t1"; gene_id "g40370";
ptg000001l GUSHR gene 4658185 4661128 . + . gene_id "g40371";
ptg000001l AUGUSTUS CDS 4658808 4658895 0.82 + 0 transcript_id "g40371.t1"; gene_id "g40371";
ptg000001l AUGUSTUS CDS 4659004 4659245 0.82 + 2 transcript_id "g40371.t1"; gene_id "g40371";
ptg000001l GUSHR gene 4658185 4661128 . - . gene_id "g40372";
ptg000001l AUGUSTUS CDS 4659267 4659576 0.68 - 1 transcript_id "g40372.t1"; gene_id "g40372";
ptg000001l AUGUSTUS CDS 4659784 4659887 0.29 - 0 transcript_id "g40372.t1"; gene_id "g40372";
For most cases the CDS are in different locations but the gene positions overlap. However in some cases the CDS overlap as well:
ptg000005l GUSHR gene 4281555 4285704 . - . gene_id "g53981";
ptg000005l AUGUSTUS CDS 4281555 4281748 0.97 - 2 transcript_id "g53981.t1"; gene_id "g53981";
ptg000005l AUGUSTUS CDS 4284732 4284936 0.62 - 0 transcript_id "g53981.t1"; gene_id "g53981";
ptg000005l AUGUSTUS CDS 4285639 4285704 0.65 - 0 transcript_id "g53981.t1"; gene_id "g53981";
ptg000005l GUSHR gene 4281555 4285704 . - . gene_id "g54006";
ptg000005l AUGUSTUS CDS 4281555 4281748 0.96 - 2 transcript_id "g54006.t1"; gene_id "g54006";
ptg000005l AUGUSTUS CDS 4284732 4284936 0.6 - 0 transcript_id "g54006.t1"; gene_id "g54006";
ptg000005l AUGUSTUS CDS 4285639 4285704 0.72 - 0 transcript_id "g54006.t1"; gene_id "g54006";
These are the exact same gene mapped under two different names.
This happens throughout the mapping, and can make visualizations and analysis of the genome more complicated. Is there a specific reason for this way of annotation? Is there a way to make sure the genes only are mapped to their respective transcript?
Kind regards, Heleen
Hello, @HeleenDeWeerd
Do you solved this problem?
Hello @yuzhenpeng
Sadly no. I have seen that one of the gtfs created before using GUSHR doesn't seem to have this issue. It might be related to GUSHR. There are still some duplications in that gtf but the problem is much less pronounced.
Regards, Heleen
We have the same issue -- lots of overlapping genes.
We have no time to debug this at the moment. Please do not use UTR features of BRAKER for now.
Huiting Zhang @.***> schrieb am Do. 2. Feb. 2023 um 19:18:
We have the same issue -- lots of overlapping genes.
— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/536#issuecomment-1414170648, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JBHMTZYP7X56K6ONR3WVP273ANCNFSM6AAAAAARCMVWEE . You are receiving this because you were assigned.Message ID: @.***>
I am closing this issue because we won't debug it. We now have a new script to decorate CDS-only transcripts with UTRs.