GFF to GTF modifies transcript_id
Describe the bug While converting gff (already fixed with agat_convert_sp_gxf2gxf.pl) to gtf using agat_convert_sp_gff2gtf.pl, the transcript_id is being modified (not all)
General (please complete the following information):
- AGAT version v1.4.3
- AGAT installation/use bioconda
- OS: Ubuntu
To Reproduce
agat_convert_sp_gff2gtf.pl --gff genomic.agat.gff -o genomic.agat.gtf
Sample records from input GFF file
# input file gff records
NC_056623.2 Gnomon mRNA 986727 990205 . - . ID=rna-XM_023890824.3;Parent=gene-LOC111894727;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;Name=XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;model_evidence=Supporting evidence includes similarity to: 1 Protein;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon exon 986727 987322 . - . ID=exon-XM_023890824.3-8;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon exon 987411 987492 . - . ID=exon-XM_023890824.3-7;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon exon 987699 987836 . - . ID=exon-XM_023890824.3-6;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon exon 987920 987991 . - . ID=exon-XM_023890824.3-5;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon exon 988096 988164 . - . ID=exon-XM_023890824.3-4;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon exon 988247 988312 . - . ID=exon-XM_023890824.3-3;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon exon 988390 988608 . - . ID=exon-XM_023890824.3-2;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon exon 988704 990205 . - . ID=exon-XM_023890824.3-1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 Gnomon CDS 987297 987322 . - 2 ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2 Gnomon CDS 987411 987492 . - 0 ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2 Gnomon CDS 987699 987836 . - 0 ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2 Gnomon CDS 987920 987991 . - 0 ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2 Gnomon CDS 988096 988164 . - 0 ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2 Gnomon CDS 988247 988312 . - 0 ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2 Gnomon CDS 988390 988608 . - 0 ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2 Gnomon CDS 988704 989096 . - 0 ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2 AGAT five_prime_UTR 989097 990205 . - . ID=agat-five_prime_utr-15681;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2 AGAT three_prime_UTR 986727 987296 . - . ID=agat-three_prime_utr-12686;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
Observed output GTF lines for these records
NC_056623.2 Gnomon mRNA 986727 990205 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "rna-XM_023890824.3"; Name "XM_023890824.3"; Parent "gene-LOC111894727"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; model_evidence "Supporting evidence includes similarity to: 1 Protein"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon exon 986727 987322 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-8"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon exon 987411 987492 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-7"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon exon 987699 987836 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-6"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon exon 987920 987991 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-5"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon exon 988096 988164 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-4"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon exon 988247 988312 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-3"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon exon 988390 988608 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-2"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon exon 988704 990205 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-1"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 Gnomon CDS 987297 987322 . - 2 gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2 Gnomon CDS 987411 987492 . - 0 gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2 Gnomon CDS 987699 987836 . - 0 gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2 Gnomon CDS 987920 987991 . - 0 gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2 Gnomon CDS 988096 988164 . - 0 gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2 Gnomon CDS 988247 988312 . - 0 gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2 Gnomon CDS 988390 988608 . - 0 gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2 Gnomon CDS 988704 989096 . - 0 gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2 AGAT five_prime_UTR 989097 990205 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "agat-five_prime_utr-15681"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2 AGAT three_prime_UTR 986727 987296 . - . gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "agat-three_prime_utr-12686"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
Expected behavior transcript_id in from input GFF: rna-XM_023890824.3 transcript_id in from output GTF: XM_023890824.3 expected transcript_id: rna-XM_023890824.3
This is not consistance for all the gene features, hence it's making an issue.
Thanks,
Best, Siva
I think I made some updates related to this in most recent versions. Could you give a try?
I just tried with v1.5.1 (from bioconda), and I still see the same output (stripped output).