BRAKER hierarchy of evidence?

hierarchy of evidence?

Open sarjopp opened this issue 3 years ago • 1 comments

I am using braker on some hymenopteran genomes. I have RNA-Seq data and have generated a set of training proteins from OrthoDB. When I run busco on my genome, I find ~10 genes that don't show up in the braker annotation. Tracing back, I see that these "missing" busco genes have good protein support, mRNA evidence, and a stop codon. However, they lack a start codon.

Clearly there IS a start codon, because I have evidence of transcription. I am surprised that braker's algorithm automatically rejects potential genes based on an absent start codon when other evidence is strong. I'm wondering 1) what is the philosophical basis for this? 2) Is there a way to tell beaker to also report incomplete gene models?

When I run gth, the missing exons are called, but when I run braker with --gth2traingenes, the exons are present in gthTrainGenes.gtf but absent from all subsequent hints files.

Thank you, Sara

Jul 03 '21 19:07 sarjopp

Hi Sara,

what is the philosophical basis for this?

There is no real philosophical basis for this; all genes with proteins support (even a weak one) are much more likely to be predicted than genes without support. There might be many reasons why these genes are not predicted. For example, the protein support might be relatively week (only one protein is supporting the stop codon) and other factors are decreasing the prediction probability. These other factors can be, e.g., (a) the gene is too short, or (b) parts of the gene are soft-masked.

Is there a way to tell braker to also report incomplete gene models?

BRAKER should be doing this by default.

You can try the tricks described in https://github.com/Gaius-Augustus/BRAKER/issues/395 to enforce the missing genes.

Best, Tomas

Aug 27 '21 17:08 tomasbruna

BRAKER BRAKER copied to clipboard

hierarchy of evidence?

BRAKER
BRAKER copied to clipboard