Error when running IsoQuant with customized gtf file.
Hi,
I'm using IsoQuant to transcript discovery. And I delete 50% of transcripts on chr1. However, I cannot complete the process.
2025-01-07 09:44:46,678 - INFO - Running IsoQuant version 3.6.1
2025-01-07 09:44:46,678 - WARNING - Output folder already exists, some files may be overwritten.
2025-01-07 09:44:46,681 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2025-01-07 09:44:46,682 - INFO - === IsoQuant pipeline started ===
2025-01-07 09:44:46,682 - INFO - gffutils version: 0.13
2025-01-07 09:44:46,682 - INFO - pysam version: 0.22.1
2025-01-07 09:44:46,682 - INFO - pyfaidx version: 0.8.1.3
2025-01-07 09:44:46,682 - INFO - Converting gene annotation file to .db format (takes a while)...
/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py:770: UserWarning: It appears you have a gene feature in your GTF file. You may want to use the `disable_infer_genes=True` option to speed up database creation
warnings.warn(
/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py:763: UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the `disable_infer_transcripts=True` option to speed up database creation
warnings.warn(
2025-01-07 09:44:46,690 - CRITICAL - IsoQuant failed with the following error, please, submit this issue to https://github.com/ablab/IsoQuant/issuesTraceback (most recent call last):
File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 790, in _populate_from_lines
self._insert(f, c)
File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 566, in _insert
cursor.execute(constants._INSERT, feature.astuple())
sqlite3.IntegrityError: UNIQUE constraint failed: features.id
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 819, in <module>
main(sys.argv[1:])
File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 813, in main
run_pipeline(args)
File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 749, in run_pipeline
args.genedb = convert_gtf_to_db(args)
File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 144, in convert_gtf_to_db
gtf_filename, genedb_filename = convert_db(gtf_filename, genedb_filename, gtf2db, args)
File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 360, in convert_db
convert_fn(gtf_filename, genedb_filename, args.complete_genedb, args.gtf_check)
File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 133, in gtf2db
gffutils.create_db(gtf, db, force=True, keep_order=True, merge_strategy='error',
File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 1401, in create_db
c.create()
File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 543, in create
self._populate_from_lines(self.iterator)
File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 792, in _populate_from_lines
fixed, final_strategy = self._do_merge(f, self.merge_strategy)
File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 257, in _do_merge
raise ValueError("Duplicate ID {0.id}".format(f))
ValueError: Duplicate ID ENST00000619216
It seems like all the other chromosome is ok, because I can see the the result on the other chromosome.
Dear @SuiYue-2308
This error is raised by gffutils that converts GTF file to internal database format.
It complains about duplicated ids:
ValueError: Duplicate ID ENST00000619216
By format specification, all ids in a GTF/GFF file must be distinct, i.e. even for different features, meaning that a gene and a transcript cannot have the same ID.
How did you get this annotation?
Best Andrey
Hi Andrey,
Thank you some much for you reply! I added some gene to the grf. After I remove the duplicate ID, It can work now!
Thank you!
Best Yue
@SuiYue-2308 I'm glad it worked out!