EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Open lisajq opened this issue 3 years ago • 3 comments

Hi Shujun,

I met two errors again.

  1. Although I changed the flanking_filter.pl file as you advised in addf414 , It runs with the same error. I can ignore this error.
  2. Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

this error will influence the output such as ref.fa.mod.EDTA.TEanno.sum:

Class Count bpMasked %masked ===== ===== ======== ======= LTR -- -- --
Copia 155 76887 0.03% Gypsy 3693 1435773 0.61% unknown 96 66209 0.03% TIR -- -- --
CACTA 4617 1100287 0.47% Mutator 16596 5079629 2.15% PIF_Harbinger 53 24853 0.01% Tc1_Mariner 803 304397 0.13% hAT 243 68435 0.03% nonLTR -- -- --
LINE_element 2393 1355450 0.57% nonTIR -- -- --
helitron 13276 3308763 1.40% repeat_region 27289 6369647 2.69% --------------------------------- total interspersed 69214 19190330 8.12%


Total 69214 19190330 8.12%

could you help me solve the second problem?

the whole running log is as below:

########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.4
Shujun Ou ([email protected])

########################################################

08:47:20 CST Dependency checking: All passed!

08:47:29 CST The longest sequence ID in the genome contains 111 characters, which is longer than the limit (15) Trying to reformat seq IDs... Attempt 1... 08:47:32 CST Seq ID conversion successful!

A custom library SINEs.fa is provided via --curatedlib. Please make sure this is a manually curated library but not machine generated.

08:47:32 CST Obtain raw TE libraries using various structure-based programs: 08:47:32 CST EDTA_raw: Check dependencies, prepare working directories.

08:47:39 CST Start to find LTR candidates.

08:47:39 CST Identify LTR retrotransposon candidates from scratch.

08:53:09 CST Finish finding LTR candidates.

08:53:09 CST Start to find TIR candidates.

08:53:09 CST Identify TIR candidates from scratch.

Species: others Thread 1 terminated abnormally: Illegal division by zero at /home/soft/anaconda2/pkgs/edta-1.9.5-0/share/EDTA/util/flanking_filter.pl line 214. 14:03:07 CST Finish finding TIR candidates.

14:03:07 CST Start to find Helitron candidates.

14:03:07 CST Identify Helitron candidates from scratch.

16:39:10 CST Finish finding Helitron candidates.

16:39:10 CST Execution of EDTA_raw.pl is finished!

16:39:10 CST Obtain raw TE libraries finished. All intact TEs found by EDTA: ref.fa.mod.EDTA.intact.fa ref.fa.mod.EDTA.intact.gff3

16:39:10 CST Perform EDTA advcance filtering for raw TE candidates and generate the stage 1 library:

16:47:48 CST EDTA advcance filtering finished.

16:47:48 CST Perform EDTA final steps to generate a non-redundant comprehensive TE library:

			Use RepeatModeler to identify any remaining TEs that are missed by structure-based methods.

12:33:54,009 -INFO- VARS: {'sequence': 'ref.fa.mod.RM.consensi.fa', 'hmm_database': 'rexdb', 'seq_type': 'nucl', 'prefix': 'ref.fa.mod.RM.consensi.fa.rexdb', 'force_write_hmmscan': False, 'processors': 4, 'tmp_dir': './tmp', 'min_coverage': 20, 'max_evalue': 0.001, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'no_cleanup': False, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0} 12:33:54,009 -INFO- checking dependencies: 12:33:54,029 -INFO- hmmer 3.3.1 OK 12:33:54,153 -INFO- blastn 2.10.0+ OK 12:33:54,154 -INFO- check database rexdb 12:33:54,154 -INFO- db path: /home/soft/anaconda2/envs/py38/lib/python3.8/site-packages/TEsorter/database 12:33:54,154 -INFO- db file: REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 12:33:54,154 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK 12:33:54,154 -INFO- Start classifying pipeline 12:33:54,181 -INFO- total 662 sequences 12:33:54,181 -INFO- translating ref.fa.mod.RM.consensi.fa in six frames /home/soft/software/lib/python3.5/site-packages/Bio/Seq.py:2306: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. warnings.warn("Partial codon, len(sequence) not a multiple of three. " 12:33:54,746 -INFO- HMM scanning against /home/soft/anaconda2/envs/py38/lib/python3.8/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm 12:33:54,813 -INFO- Creating server instance (pp-1.6.4.4) 12:33:54,813 -INFO- Running on Python 3.8.5 linux 12:33:55,268 -INFO- pp local server started with 4 workers 12:33:55,317 -INFO- Task 0 started 12:33:55,318 -INFO- Task 1 started 12:33:55,319 -INFO- Task 2 started 12:33:55,321 -INFO- Task 3 started 12:34:06,727 -INFO- generating gene anntations 12:34:06,791 -INFO- 32 sequences classified by HMM 12:34:06,791 -INFO- see protein domain sequences in ref.fa.mod.RM.consensi.fa.rexdb.dom.faa and annotation gff3 file in ref.fa.mod.RM.consensi.fa.rexdb.dom.gff3 12:34:06,791 -INFO- classifying the unclassified sequences by searching against the classified ones 12:34:06,823 -INFO- using the 80-80-80 rule 12:34:06,823 -INFO- run CMD: makeblastdb -in ./tmp/pass1_classified.fa -dbtype nucl 12:34:06,943 -INFO- run CMD: blastn -query ./tmp/pass1_unclassified.fa -db ./tmp/pass1_classified.fa -out ./tmp/pass1_unclassified.fa.blastout -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp sstrand' -num_threads 4 12:34:07,311 -INFO- 6 sequences classified in pass 2 12:34:07,312 -INFO- total 38 sequences classified. 12:34:07,312 -INFO- see classified sequences in ref.fa.mod.RM.consensi.fa.rexdb.cls.tsv 12:34:07,312 -INFO- writing library for RepeatMasker in ref.fa.mod.RM.consensi.fa.rexdb.cls.lib 12:34:07,350 -INFO- writing classified protein domains in ref.fa.mod.RM.consensi.fa.rexdb.cls.pep 12:34:07,353 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains LTR Copia 4 1 1 0 LTR Gypsy 3 1 1 0 LINE unknown 20 0 0 0 TIR PiggyBac 1 0 0 0 TIR Tc1_Mariner 10 0 0 0 12:34:07,353 -INFO- Pipeline done. 12:34:07,353 -INFO- cleaning the temporary directory ./tmp Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

12:43:10 CST Combine the high-quality TE library SINEs.fa with the EDTA library:

12:43:22 CST EDTA final stage finished! You may check out: The final EDTA TE library: ref.fa.mod.EDTA.TElib.fa Family names of intact TEs have been updated by SINEs.fa: ref.fa.mod.EDTA.intact.gff3 Comparing to the provided library, EDTA found these novel TEs: ref.fa.mod.EDTA.TElib.novel.fa The provided library has been incorporated into the final library: ref.fa.mod.EDTA.TElib.fa

12:43:22 CST Perform post-EDTA analysis for whole-genome annotation:

12:43:22 CST Homology-based annotation of TEs using ref.fa.mod.EDTA.TElib.fa from scratch.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: TIR/PiggyBac not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it.

Warning: Unspecified not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. 13:01:15 CST TE annotation using the EDTA library has finished! Check out: Whole-genome TE annotation (total TE: 8.12%): ref.fa.mod.EDTA.TEanno.gff3 Whole-genome TE annotation summary: ref.fa.mod.EDTA.TEanno.sum Low-threshold TE masking for MAKER gene annotation (masked: 6.24%): ref.fa.mod.MAKER.masked

Best Cuzz lisa

lisajq avatar Jan 18 '21 02:01 lisajq

Hi Lisa,

I need the sequence causing error 1 to further diagnose and fix it.

Issue 2 is not an error, it says "TIR/PiggyBac" is not in the TE_SO database and thus it can not fill in the specific term. Looks like you have quite a lot of PiggyBac in your genome. "Unspecified" is another repetitive sequence found by RepeatModeler, although with an unknown category. Thus, naming it as "repeat_region" makes sense. I can further fix the second entry in the code so that it will do it automatically and silently next time. But for the PiggyBac element, it needs to be in the Sequence Ontology system first. You can open an issue here to add it: https://github.com/The-Sequence-Ontology/SO-Ontologies/issues

The workaround is to find the PiggyBac sequence name and replace the classification in the gff3 file by yourself, then summarize it again. This will save you some effort in making the new SO term, but it will also make your gff3 file non-standard and cause issues when other programs need the correct SO term.

Best, Shujun

oushujun avatar Jan 18 '21 03:01 oushujun

It is unclear whether this and #178 is an error, and if it will negatively impact the downstream .gff that is output by EDTA. How does this problem manifest downstream? Do these repeats still end up in a .gff somewhere in the EDTA output?

I'm asking because I need to classify the total % of some genomes that are repeats, TEs, et cetera. However, I don't necessarily care if we get the exact class or annotation correct. So, having the entry present as an "unclassified" category in the .gff sounds ideal.

conchoecia avatar Apr 12 '21 22:04 conchoecia

@conchoecia If the repeat type is unknown to the Sequence Ontology system, then it's classified as "repetitive_sequence", a more general term that is available in the SO system. So if you don't care if this type is classified accurately, you can ignore the warning. They are counted in the final report.

oushujun avatar Apr 13 '21 09:04 oushujun