IsoQuant icon indicating copy to clipboard operation
IsoQuant copied to clipboard

Error when running IsoQuant with customized gtf file.

Open SuiYue-2308 opened this issue 1 year ago • 3 comments

Hi,

I'm using IsoQuant to transcript discovery. And I delete 50% of transcripts on chr1. However, I cannot complete the process.

2025-01-07 09:44:46,678 - INFO - Running IsoQuant version 3.6.1
2025-01-07 09:44:46,678 - WARNING - Output folder already exists, some files may be overwritten.
2025-01-07 09:44:46,681 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2025-01-07 09:44:46,682 - INFO -  === IsoQuant pipeline started === 
2025-01-07 09:44:46,682 - INFO - gffutils version: 0.13
2025-01-07 09:44:46,682 - INFO - pysam version: 0.22.1
2025-01-07 09:44:46,682 - INFO - pyfaidx version: 0.8.1.3
2025-01-07 09:44:46,682 - INFO - Converting gene annotation file to .db format (takes a while)...
/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py:770: UserWarning: It appears you have a gene feature in your GTF file. You may want to use the `disable_infer_genes=True` option to speed up database creation
  warnings.warn(
/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py:763: UserWarning: It appears you have a transcript feature in your GTF file. You may want to use the `disable_infer_transcripts=True` option to speed up database creation
  warnings.warn(
2025-01-07 09:44:46,690 - CRITICAL - IsoQuant failed with the following error, please, submit this issue to https://github.com/ablab/IsoQuant/issuesTraceback (most recent call last):
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 790, in _populate_from_lines
    self._insert(f, c)
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 566, in _insert
    cursor.execute(constants._INSERT, feature.astuple())
sqlite3.IntegrityError: UNIQUE constraint failed: features.id

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 819, in <module>
    main(sys.argv[1:])
  File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 813, in main
    run_pipeline(args)
  File "/home/suiyue/Documents/other_method/IsoQuant/isoquant.py", line 749, in run_pipeline
    args.genedb = convert_gtf_to_db(args)
  File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 144, in convert_gtf_to_db
    gtf_filename, genedb_filename = convert_db(gtf_filename, genedb_filename, gtf2db, args)
  File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 360, in convert_db
    convert_fn(gtf_filename, genedb_filename, args.complete_genedb, args.gtf_check)
  File "/home/suiyue/Documents/other_method/IsoQuant/src/gtf2db.py", line 133, in gtf2db
    gffutils.create_db(gtf, db, force=True, keep_order=True, merge_strategy='error',
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 1401, in create_db
    c.create()
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 543, in create
    self._populate_from_lines(self.iterator)
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 792, in _populate_from_lines
    fixed, final_strategy = self._do_merge(f, self.merge_strategy)
  File "/home/suiyue/.local/lib/python3.10/site-packages/gffutils/create.py", line 257, in _do_merge
    raise ValueError("Duplicate ID {0.id}".format(f))
ValueError: Duplicate ID ENST00000619216

It seems like all the other chromosome is ok, because I can see the the result on the other chromosome. image

SuiYue-2308 avatar Jan 07 '25 03:01 SuiYue-2308

Dear @SuiYue-2308

This error is raised by gffutils that converts GTF file to internal database format. It complains about duplicated ids: ValueError: Duplicate ID ENST00000619216

By format specification, all ids in a GTF/GFF file must be distinct, i.e. even for different features, meaning that a gene and a transcript cannot have the same ID.

How did you get this annotation?

Best Andrey

andrewprzh avatar Jan 08 '25 23:01 andrewprzh

Hi Andrey,

Thank you some much for you reply! I added some gene to the grf. After I remove the duplicate ID, It can work now! image

Thank you!

Best Yue

SuiYue-2308 avatar Jan 14 '25 01:01 SuiYue-2308

@SuiYue-2308 I'm glad it worked out!

andrewprzh avatar Jan 14 '25 14:01 andrewprzh