perf icon indicating copy to clipboard operation
perf copied to clipboard

--gene-key Error

Open Rohit-Satyam opened this issue 2 years ago • 2 comments

Hi !!

I was trying to use the updated version of PERF and use the new feature for one of my bacterial strains. However, I am getting the following error

PERF -i ../raw/Tenacibaculum_discolor_gca_003664185.fa --format fasta -a -g ../raw/Tenacibaculum_discolor_gca_003664185.ASM366418v1.49.gff3 --anno-format GFF --gene-key ID

ERROR:

Processing Ga0183463_112: 100%|█████████████████████████████████████████████████████████| 12/12 [00:04<00:00,  2.97it/s]

GeneKeyError:
The attribute "gene_id" is not among the attributes for gene. Please select a different one.
The available ones are [Parent, Name, constitutive, ensembl_end_phase, ensembl_phase, exon_id, rank]

My GFF files contains the following flags in last column but changing it to ID or any other flag isn't working

ID=gene:C8N27_0080;biotype=protein_coding;description=cyclophilin family peptidyl-prolyl cis-trans isomerase;gene_id=C8N27_0080;logic_name=ena

When I use GTF file the error is

Using length cutoff of 12
Processing Ga0183463_112: 100%|█████████████████████████████████████████████████████████| 14/14 [00:03<00:00,  3.66it/s]
Traceback (most recent call last):
  File "/home/rohit/miniconda3/bin/PERF", line 8, in <module>
    sys.exit(main())
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/core.py", line 162, in main
    ssr_native(args, length_cutoff=args.min_length)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/core.py", line 106, in ssr_native
    fasta_ssrs(args, repeats_info)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/rep_utils.py", line 253, in fasta_ssrs
    annotate(args)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 160, in annotate
    gffObject = process_annofile(anno_file, annotype, gene_id)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 112, in process_annofile
    attr_obj = process_attrs(attribute, annotype)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 66, in process_attrs
    attr_obj[attrName] = attr[1].strip()
IndexError: list index out of range

I am not sure what is being used in the background to process GFF/GTF files but my highest recommendation is to integrate PERF with AGAT which is an excellent tool for GTF/GFF file processing and handling.

Rohit-Satyam avatar Oct 26 '21 11:10 Rohit-Satyam

Hi, Sorry you had to face the issue. I can see that you have mentioned the gene identifier as ID. Can you please check if any of the entries is missing an ID identifier? PERF uses a in house script for parsing GFF and GTF files and maybe facing an issue. Thank you for the suggestion on integrating AGAT with PERF. I'll surely look into it.

avvaruakshay avatar Oct 29 '21 10:10 avvaruakshay

Hi, Based on you input files I have downloaded the genome and GFF of "Tenacibaculum_discolor" from NCBI and run PERF on it.

Command:

PERF -i GCF_003664185.1_ASM366418v1_genomic.fna.gz -g GCF_003664185.1_ASM366418v1_genomic.gff.gz --gene-key ID

Using length cutoff of 12
Processing NZ_RCCS01000003.1: 100%|██████████████████████| 12/12 [00:00<00:00, 18.09it/s]

Generating annotations for identified repeats..
100%|██████████████████████████████████| 2759/2759 [00:00<00:00, 32419.89it/s]

Output:

NZ_RCCS01000004.1	1021	1033	AAAATT	12	-	2	TTTAAT	gene-C8N27_RS00345427	1491	-	Genic	Promoter	-594
NZ_RCCS01000004.1	1452	1466	AACAC	14	-	2	TGTGT	gene-C8N27_RS00345427	1491	-	Genic	Promoter	-1025
NZ_RCCS01000004.1	2143	2155	AAAATG	12	-	2	CATTTT	gene-C8N27_RS003501668	4418	-	Genic	Promoter	-475
NZ_RCCS01000004.1	2301	2313	AAACG	12	-	2	TTCGT	gene-C8N27_RS003501668	4418	-	Genic	Promoter	-633

Seems to have not faced any issue. Can you please check your input file.

avvaruakshay avatar Oct 29 '21 11:10 avvaruakshay