GFF3 cannot be recognized
Hi,
The tool says that it can work with GFF3. But it only works with GTF. Can we get GFF3 support?
Error I get when I provide GFF3 formatted file with the --genedb option
2024-09-19 11:35:13,297 - ERROR - Input GTF seems to be corrupted (see warnings above).
2024-09-19 11:35:13,297 - ERROR - An attempt to correct this GTF was made, the result is written to dummy.corrected.gff3
2024-09-19 11:35:13,297 - ERROR - NB! some transcript / gene ids in the corrected annotation are modified.
2024-09-19 11:35:13,297 - ERROR - Provide a correct GTF by fixing the original input GTF or checking the corrected one.
Do you consume the gene annotations in GTF format or Bed12 format? Is it ok to provide a bed12 file directly?
Thanks Abhijit
Dear @sanyalab
IsoQuant does support both GTF and GFF, but not BED. Could you send me the entire isoquant.log file?
Also, you can try running IsoQuant with --no_gtf_check.
Best Andrey
Hi Andrey,
I actually went ahead and converted the GFF3 to a geneDB format using gffutils. This would be a preprocessing step. It seems to be running fine now. The isoquant.log file is 152MB in size and I cannot upload the same. But here are the first 10 lines and the last 10 FIRST:
Command line: isoquant.py --reference genome.fa --genedb Annotation.gff3 --fastq Sample1.flnc.fastq Sample2.flnc.fastq Sample3.flnc.fastq Sample4.flnc.fastq --output FL_ALL --prefix OUT --data_type pacbio_ccs --fl_data --threads 24 --check_canonical --sqanti_output --matching_strategy precise --splice_correction_strategy default_pacbio --model_construction_strategy fl_pacbio
2024-09-19 11:34:28,180 - INFO - Running IsoQuant version 3.5.0
2024-09-19 11:34:28,222 - INFO - === IsoQuant pipeline started ===
2024-09-19 11:34:28,222 - INFO - gffutils version: 0.13
2024-09-19 11:34:28,223 - INFO - pysam version: 0.22.1
2024-09-19 11:34:28,223 - INFO - pyfaidx version: 0.8.1.1
2024-09-19 11:34:28,228 - INFO - Checking input gene annotation
2024-09-19 11:34:29,316 - WARNING - Malformed GTF line 2 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,316 - WARNING - Chr00 GSAP gene 151 2235 . + . ID=dummy1;Name=dummy1
2024-09-19 11:34:29,316 - WARNING - Malformed GTF line 3 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP mRNA 151 2235 . + ID=dummy1.1;Parent=dummy1;Name=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 4 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP exon 151 2235 . + . ID=dummy1.1.exon1;Parent=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 5 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP CDS 151 2235 . + 0 ID=dummy1.1.cds1;Parent=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 6 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP gene 2412 4316 . + . ID=dummy2;Name=dummy2
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 7 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00 GSAP mRNA 2412 4316 . + . ID=dummy2.1;Parent=dummy2;Name=dummy2.1
LAST:
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638230 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP exon 1450283 1450513 . + . ID=dummy6432.1.exon1;Parent=dummy6432.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638231 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP CDS 1450283 1450513 . + 0 ID=dummy6432.1.cds1;Parent=dummy6432.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638232 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP gene 1465536 1465607 . - . ID=dummy6433;Name=dummy6433
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638233 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP mRNA 1465536 1465607 . - . ID=dummy6433.1;Parent=dummy6433;Name=dummy6433.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638234 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26 GSAP exon 1465536 1465607 . - . ID=dummy6433.1.exon1;Parent=dummy6433.1
2024-09-19 11:35:13,297 - ERROR - Input GTF seems to be corrupted (see warnings above).
2024-09-19 11:35:13,297 - ERROR - An attempt to correct this GTF was made, the result is written to /Path/FL_ALL/Annotation.corrected.gff3
2024-09-19 11:35:13,297 - ERROR - NB! some transcript / gene ids in the corrected annotation are modified.
2024-09-19 11:35:13,297 - ERROR - Provide a correct GTF by fixing the original input GTF or checking the corrected one.
Its not recognizing the GFF3 file
@sanyalab
Thanks a lot! I will add GFF3 support to the internal checker.
So if gffutils converted it, you can run IsoQuant with --no_gtf_check as well.
GFF3 should work in IsoQuant 3.6.1 without warnings.