BRAKER icon indicating copy to clipboard operation
BRAKER copied to clipboard

Retrain using braker.gtf + new proteins

Open sarjopp opened this issue 3 years ago • 1 comments

After completing Braker+TSEBRA, I find that there are Busco genes called on the genome that are not present in the set of predicted amino acids. These Busco genes mostly are supported by my RNA-Seq reads. I would like to incorporate these genes in my final prediction set. What is the best way to do so? Do I re-run the pipeline and provide the Busco-generated AA sequences of the "missing" genes as protein evidence? If so, do I still provide the larger protein set I used initially? More fundamentally, how do I tell Braker to use the existing braker.gtf as a starting place, rather than just doing a de novo Braker run?

Also, if I wanted to increase the weight given to expression evidence, how could I do so? It is unclear to me why putative Busco genes that have mRNA support were not called as genes in the initial Braker run.

Many thanks, Sara

sarjopp avatar Jun 28 '21 18:06 sarjopp

Hi Sara,

for the first problem (using the existing BRAKER model and adding Busco-generated AA sequences of the "missing" genes as protein evidence) you can follow a procedure similar to the one in the https://github.com/Gaius-Augustus/BRAKER/issues/338 issue: Use the same species name,--skipAllTraining --hints TRAINING_WORKDIR/hintsfile.gff flags, and add the extra BUSCO proteins as --prot_seq busco.fa.

Also, if I wanted to increase the weight given to expression evidence, how could I do so? It is unclear to me why putative Busco genes that have mRNA support were not called as genes in the initial Braker run.

  • You can increase the weight of particular hints by changing the src=E flag to src=M in the hintsfile (specified by --hints). All hints with the M source are enforced in the final prediction.
  • If you want to increase the weight of all expression evidence, that can also be done, but the process is a bit more complicated. Let me know if you are interested in more details about this.

Sorry for the late response, hope you will still find the advice useful.

Best, Tomas

tomasbruna avatar Aug 27 '21 16:08 tomasbruna