AGAT icon indicating copy to clipboard operation
AGAT copied to clipboard

GFF to GTF modifies transcript_id

Open sivasubramanics opened this issue 5 months ago • 2 comments

Describe the bug While converting gff (already fixed with agat_convert_sp_gxf2gxf.pl) to gtf using agat_convert_sp_gff2gtf.pl, the transcript_id is being modified (not all)

General (please complete the following information):

  • AGAT version v1.4.3
  • AGAT installation/use bioconda
  • OS: Ubuntu

To Reproduce

agat_convert_sp_gff2gtf.pl --gff genomic.agat.gff -o genomic.agat.gtf

Sample records from input GFF file

# input file gff records
NC_056623.2     Gnomon  mRNA    986727  990205  .       -       .       ID=rna-XM_023890824.3;Parent=gene-LOC111894727;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;Name=XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;model_evidence=Supporting evidence includes similarity to: 1 Protein;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  exon    986727  987322  .       -       .       ID=exon-XM_023890824.3-8;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  exon    987411  987492  .       -       .       ID=exon-XM_023890824.3-7;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  exon    987699  987836  .       -       .       ID=exon-XM_023890824.3-6;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  exon    987920  987991  .       -       .       ID=exon-XM_023890824.3-5;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  exon    988096  988164  .       -       .       ID=exon-XM_023890824.3-4;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  exon    988247  988312  .       -       .       ID=exon-XM_023890824.3-3;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  exon    988390  988608  .       -       .       ID=exon-XM_023890824.3-2;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  exon    988704  990205  .       -       .       ID=exon-XM_023890824.3-1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     Gnomon  CDS     987297  987322  .       -       2       ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2     Gnomon  CDS     987411  987492  .       -       0       ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2     Gnomon  CDS     987699  987836  .       -       0       ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2     Gnomon  CDS     987920  987991  .       -       0       ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2     Gnomon  CDS     988096  988164  .       -       0       ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2     Gnomon  CDS     988247  988312  .       -       0       ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2     Gnomon  CDS     988390  988608  .       -       0       ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2     Gnomon  CDS     988704  989096  .       -       0       ID=cds-XP_023746592.1;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XP_023746592.1;Name=XP_023746592.1;gbkey=CDS;gene=LOC111894727;product=transcription factor bHLH74 isoform X1;protein_id=XP_023746592.1
NC_056623.2     AGAT    five_prime_UTR  989097  990205  .       -       .       ID=agat-five_prime_utr-15681;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3
NC_056623.2     AGAT    three_prime_UTR 986727  987296  .       -       .       ID=agat-three_prime_utr-12686;Parent=rna-XM_023890824.3;Dbxref=GeneID:111894727,GenBank:XM_023890824.3;experiment=COORDINATES: polyA evidence [ECO:0006239];gbkey=mRNA;gene=LOC111894727;product=transcription factor bHLH74%2C transcript variant X1;transcript_id=XM_023890824.3

Observed output GTF lines for these records

NC_056623.2	Gnomon	mRNA	986727	990205	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "rna-XM_023890824.3"; Name "XM_023890824.3"; Parent "gene-LOC111894727"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; model_evidence "Supporting evidence includes similarity to: 1 Protein"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	exon	986727	987322	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-8"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	exon	987411	987492	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-7"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	exon	987699	987836	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-6"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	exon	987920	987991	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-5"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	exon	988096	988164	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-4"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	exon	988247	988312	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-3"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	exon	988390	988608	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-2"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	exon	988704	990205	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "exon-XM_023890824.3-1"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	Gnomon	CDS	987297	987322	.	-	2	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2	Gnomon	CDS	987411	987492	.	-	0	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2	Gnomon	CDS	987699	987836	.	-	0	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2	Gnomon	CDS	987920	987991	.	-	0	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2	Gnomon	CDS	988096	988164	.	-	0	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2	Gnomon	CDS	988247	988312	.	-	0	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2	Gnomon	CDS	988390	988608	.	-	0	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2	Gnomon	CDS	988704	989096	.	-	0	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XP_023746592.1"; ID "cds-XP_023746592.1"; Name "XP_023746592.1"; Parent "rna-XM_023890824.3"; gbkey "CDS"; gene "LOC111894727"; product "transcription factor bHLH74 isoform X1"; protein_id "XP_023746592.1";
NC_056623.2	AGAT	five_prime_UTR	989097	990205	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "agat-five_prime_utr-15681"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";
NC_056623.2	AGAT	three_prime_UTR	986727	987296	.	-	.	gene_id "gene-LOC111894727"; transcript_id "XM_023890824.3"; Dbxref "GeneID:111894727" "GenBank:XM_023890824.3"; ID "agat-three_prime_utr-12686"; Parent "rna-XM_023890824.3"; experiment "COORDINATES: polyA evidence [ECO:0006239]"; gbkey "mRNA"; gene "LOC111894727"; product "transcription factor bHLH74, transcript variant X1";

Expected behavior transcript_id in from input GFF: rna-XM_023890824.3 transcript_id in from output GTF: XM_023890824.3 expected transcript_id: rna-XM_023890824.3

This is not consistance for all the gene features, hence it's making an issue.

Thanks,

Best, Siva

sivasubramanics avatar Jul 28 '25 21:07 sivasubramanics

I think I made some updates related to this in most recent versions. Could you give a try?

Juke34 avatar Jul 28 '25 22:07 Juke34

I just tried with v1.5.1 (from bioconda), and I still see the same output (stripped output).

sivasubramanics avatar Jul 31 '25 20:07 sivasubramanics