gffread icon indicating copy to clipboard operation
gffread copied to clipboard

Format issues between gffread versions

Open EA2106-Universite-Francois-Rabelais opened this issue 3 years ago • 0 comments

Dear @gpertea , we found a strange behavior while creating databases for snpEff (https://github.com/pcingola/SnpEff). We used gffread to convert a .gff3 file into a .gtf format. But it appears that there are some differences in the output given by gffread-0.12.4 and gffread-0.12.7: using the same input, the md5sum differ:

$ md5sum test.2.12.7.gtf
6aa324e5d0b46410f1dc212eba2d8b44  test.2.12.7.gtf
$ md5sum test.2.12.4.gtf
cfdf64f1143bb34e9038ec70ca43bb8e  test.2.12.4.gtf

output of 0.12.4:

chr00   maker   transcript      131062  131377  .       +       .       transcript_id "MELO3C027429.2.1"; gene_id "MELO3C027429.2.1";
chr00   maker   exon    131062  131377  .       +       .       transcript_id "MELO3C027429.2.1";
chr00   maker   CDS     131121  131174  .       +       0       transcript_id "MELO3C027429.2.1";

output of 0.12.7

chr00   maker   transcript      131062  131377  .       +       .       transcript_id "MELO3C027429.2.1"; gene_id "MELO3C027429.2.1"
chr00   maker   exon    131062  131377  .       +       .       transcript_id "MELO3C027429.2.1";
chr00   maker   CDS     131121  131174  .       +       0       transcript_id "MELO3C027429.2.1";

A ';' is missing every all 'gene_id' in the output of 0.12.7. I do not know whether other downstream programs are also affected, but it is clearly an issue to build snpEff databases!

Thanks in advance!