BRAKER icon indicating copy to clipboard operation
BRAKER copied to clipboard

The easiest way to convert braker output to EVM

Open life404 opened this issue 2 years ago • 1 comments

Run the braker with --gff3 parametrs. The output file braker.gff3 can be converted to EVM GFF3 format using augustus_GFF3_to_EVM_GFF3.pl. However, before coveting, you need delete the ; at the end of lline of braker.gff3. For example

$ head braker.gff3
HiC_scaffold_3  AUGUSTUS        gene    1720719 1722807 .       +       .       ID=jg51092;
HiC_scaffold_3  AUGUSTUS        mRNA    1720719 1722807 .       +       .       ID=jg51092.t1;Parent=jg51092;
HiC_scaffold_3  AUGUSTUS        start_codon     1720719 1720721 .       +       0       ID=jg51092.t1.start1;Parent=jg51092.t1;
HiC_scaffold_3  AUGUSTUS        CDS     1720719 1720866 0.78    +       0       ID=jg51092.t1.CDS1;Parent=jg51092.t1;

$ sed -i 's/;$//g' braker.gff3

$ head braker.gff3
HiC_scaffold_3  AUGUSTUS        gene    1720719 1722807 .       +       .       ID=jg51092
HiC_scaffold_3  AUGUSTUS        mRNA    1720719 1722807 .       +       .       ID=jg51092.t1;Parent=jg51092
HiC_scaffold_3  AUGUSTUS        start_codon     1720719 1720721 .       +       0       ID=jg51092.t1.start1;Parent=jg51092.t1
HiC_scaffold_3  AUGUSTUS        CDS     1720719 1720866 0.78    +       0       ID=jg51092.t1.CDS1;Parent=jg51092.t1

$ EVidenceModeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl braker.gff3 > braker.evm.gff3

$head braker.evm.gff3
HiC_scaffold_99 Braker  gene    5882    6218    .       +       .       ID=gene.file_1_file_1_jg16.t1;Name=Braker%20prediction
HiC_scaffold_99 Braker  mRNA    5882    6218    .       +       .       ID=model.file_1_file_1_jg16.t1;Parent=gene.file_1_file_1_jg16.t1;Name=Braker%20prediction
HiC_scaffold_99 Braker  exon    5882    6218    .       +       .       ID=model.file_1_file_1_jg16.t1.exon1;Parent=model.file_1_file_1_jg16.t1
HiC_scaffold_99 Braker  CDS     5882    6218    .       +       .       ID=cds.model.file_1_file_1_jg16.t1;Parent=model.file_1_file_1_jg16.t1

If you don't do like this, the ; at the end of line will cause error in the converted EVM GFF3 file.

$ EVidenceModeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl braker.gff3|head
HiC_scaffold_96 Augustus        gene    13129   13184   .       -       .       ID=gene.file_1_file_1_jg32.t1;;Name=Augustus%20prediction
HiC_scaffold_96 Augustus        mRNA    13129   13184   .       -       .       ID=model.file_1_file_1_jg32.t1;;Parent=gene.file_1_file_1_jg32.t1;;Name=Augustus%20prediction
HiC_scaffold_96 Augustus        exon    13129   13184   .       -       .       ID=model.file_1_file_1_jg32.t1;.exon1;Parent=model.file_1_file_1_jg32.t1
HiC_scaffold_96 Augustus        CDS     13129   13184   .       -       .       ID=cds.model.file_1_file_1_jg32.t1;;Parent=model.file_1_file_1_jg32.t1

As you can see, in the exon line, an incorrect ; appear in the middle of ID=model.file_1_file_1_jg32.t1;.exon1; . The incorrect ; will cause ERROR, CDS cds.model.file_1_file_1_jg32.t1.HiC_scaffold_14:43163129-43163185 does not fully map within an exon record. error when using validator from EVM.

The braker.gtf file also can be convert to gff3 format with the method in #123, then convert gff3 file following above steps.

The braker.gtf can be converted using augustus_GTF_to_EVM_GFF3.pl, if you have corrected the order of gene_id and transcript_id in the 9th column of gtf manually.

Incorrect order will cause Error, cannot parse gene_id and transcript_id from HiC_scaffold_13 error .

However, the EVM GFF3 file produced by augustus_GTF_to_EVM_GFF3.pl will assigned same id to different genes, and will cause Error, feature: HiC_scaffold_5-jg35973 is described multiple times with different data values: using validator from EVM.

life404 avatar Apr 13 '22 04:04 life404

wonderful

Jiangjiangzhang6 avatar Aug 29 '22 02:08 Jiangjiangzhang6