perf
perf copied to clipboard
--gene-key Error
Hi !!
I was trying to use the updated version of PERF and use the new feature for one of my bacterial strains. However, I am getting the following error
PERF -i ../raw/Tenacibaculum_discolor_gca_003664185.fa --format fasta -a -g ../raw/Tenacibaculum_discolor_gca_003664185.ASM366418v1.49.gff3 --anno-format GFF --gene-key ID
ERROR:
Processing Ga0183463_112: 100%|█████████████████████████████████████████████████████████| 12/12 [00:04<00:00, 2.97it/s]
GeneKeyError:
The attribute "gene_id" is not among the attributes for gene. Please select a different one.
The available ones are [Parent, Name, constitutive, ensembl_end_phase, ensembl_phase, exon_id, rank]
My GFF files contains the following flags in last column but changing it to ID
or any other flag isn't working
ID=gene:C8N27_0080;biotype=protein_coding;description=cyclophilin family peptidyl-prolyl cis-trans isomerase;gene_id=C8N27_0080;logic_name=ena
When I use GTF file the error is
Using length cutoff of 12
Processing Ga0183463_112: 100%|█████████████████████████████████████████████████████████| 14/14 [00:03<00:00, 3.66it/s]
Traceback (most recent call last):
File "/home/rohit/miniconda3/bin/PERF", line 8, in <module>
sys.exit(main())
File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/core.py", line 162, in main
ssr_native(args, length_cutoff=args.min_length)
File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/core.py", line 106, in ssr_native
fasta_ssrs(args, repeats_info)
File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/rep_utils.py", line 253, in fasta_ssrs
annotate(args)
File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 160, in annotate
gffObject = process_annofile(anno_file, annotype, gene_id)
File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 112, in process_annofile
attr_obj = process_attrs(attribute, annotype)
File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 66, in process_attrs
attr_obj[attrName] = attr[1].strip()
IndexError: list index out of range
I am not sure what is being used in the background to process GFF/GTF files but my highest recommendation is to integrate PERF with AGAT which is an excellent tool for GTF/GFF file processing and handling.
Hi, Sorry you had to face the issue. I can see that you have mentioned the gene identifier as ID. Can you please check if any of the entries is missing an ID identifier? PERF uses a in house script for parsing GFF and GTF files and maybe facing an issue. Thank you for the suggestion on integrating AGAT with PERF. I'll surely look into it.
Hi, Based on you input files I have downloaded the genome and GFF of "Tenacibaculum_discolor" from NCBI and run PERF on it.
Command:
PERF -i GCF_003664185.1_ASM366418v1_genomic.fna.gz -g GCF_003664185.1_ASM366418v1_genomic.gff.gz --gene-key ID
Using length cutoff of 12
Processing NZ_RCCS01000003.1: 100%|██████████████████████| 12/12 [00:00<00:00, 18.09it/s]
Generating annotations for identified repeats..
100%|██████████████████████████████████| 2759/2759 [00:00<00:00, 32419.89it/s]
Output:
NZ_RCCS01000004.1 1021 1033 AAAATT 12 - 2 TTTAAT gene-C8N27_RS00345427 1491 - Genic Promoter -594
NZ_RCCS01000004.1 1452 1466 AACAC 14 - 2 TGTGT gene-C8N27_RS00345427 1491 - Genic Promoter -1025
NZ_RCCS01000004.1 2143 2155 AAAATG 12 - 2 CATTTT gene-C8N27_RS003501668 4418 - Genic Promoter -475
NZ_RCCS01000004.1 2301 2313 AAACG 12 - 2 TTCGT gene-C8N27_RS003501668 4418 - Genic Promoter -633
Seems to have not faced any issue. Can you please check your input file.