gffread
gffread copied to clipboard
Format issues between gffread versions
Dear @gpertea , we found a strange behavior while creating databases for snpEff (https://github.com/pcingola/SnpEff). We used gffread to convert a .gff3 file into a .gtf format. But it appears that there are some differences in the output given by gffread-0.12.4 and gffread-0.12.7: using the same input, the md5sum differ:
$ md5sum test.2.12.7.gtf
6aa324e5d0b46410f1dc212eba2d8b44 test.2.12.7.gtf
$ md5sum test.2.12.4.gtf
cfdf64f1143bb34e9038ec70ca43bb8e test.2.12.4.gtf
output of 0.12.4:
chr00 maker transcript 131062 131377 . + . transcript_id "MELO3C027429.2.1"; gene_id "MELO3C027429.2.1";
chr00 maker exon 131062 131377 . + . transcript_id "MELO3C027429.2.1";
chr00 maker CDS 131121 131174 . + 0 transcript_id "MELO3C027429.2.1";
output of 0.12.7
chr00 maker transcript 131062 131377 . + . transcript_id "MELO3C027429.2.1"; gene_id "MELO3C027429.2.1"
chr00 maker exon 131062 131377 . + . transcript_id "MELO3C027429.2.1";
chr00 maker CDS 131121 131174 . + 0 transcript_id "MELO3C027429.2.1";
A ';' is missing every all 'gene_id' in the output of 0.12.7. I do not know whether other downstream programs are also affected, but it is clearly an issue to build snpEff databases!
Thanks in advance!