EDTA
EDTA copied to clipboard
PanEDTA Line Detection
Hello Shujun,
Hope you are doing well. I am writing to share that I had issues with LINE detection in PanEDTA. I am hoping that this will help anyone else who encounters this issue. It is not a bug, just something that I think folks could easily overlook. When I ran PanEDTA (v2.1.0) on its own without pre-calculating results with regular EDTA, it was not finding any LINE elements in my genomes.
After doing some testing, I think it is because the panEDTA script by default calls EDTA.pl
without the --sensitive 1
option. The sensitive option calls RepeatModeler. I also observed that when I ran regular EDTA.pl
on a genome without the sensitive option, it did not recover any LINEs. So to summarize, it seems that RepeatModeler was doing the heavy lifting for LINE detection in my strawberry genomes, and without it, I wasn't detecting any LINEs. Jordan B, a post-doc in Pat's lab also had this same LINE issue with some Camelina genomes.
In my case, I fixed the issue by running EDTA individually on each genome with the option, and completed the pangenome annotation with panEDTA. That approach worked fine, LINEs were indeed included in my final annotation.
This problem only arises if users decide to use panEDTA to perform all steps of their pangenome annotation. It can easily be sidestepped if user's create the individual annotations with the --sensitive 1
option first.
Sincerely, Scott Teresi
Hi Scott,
Thank you for reporting this! Can you please update EDTA to 2.2.0 and test panEDTA again? There are many big changes to the new version for improved SINE/LINE annotations.
Thanks, Shujun
Any luck?
Shujun
My apologies for the delayed response Shujun, I will update EDTA this weekend or early next week and follow-up
Hi Shujun,
I am actually having additional trouble with EDTA 2.2.0 now that I have updated. I am having a lot of error getting conda to resolve dependencies when installing, so I elected to use singularity. That installation worked, but now when I run the genomes I get 0 LINE result files and the TIR detection fails, but does not crash. Here is a sample output:
Mon Apr 15 15:59:43 EDT 2024 Start to find LINE candidates.
Mon Apr 15 15:59:43 EDT 2024 Identify LINE retrotransposon candidates from scratch.
Tue Apr 16 07:25:21 EDT 2024 Warning: The LINE result file has 0 bp!
Tue Apr 16 07:25:21 EDT 2024 Start to find TIR candidates.
Tue Apr 16 07:25:21 EDT 2024 Identify TIR candidates from scratch.
Species: others
find: ./TIR-Learner-+-TIRvish.gff3: No such file or directory
Traceback (most recent call last):
File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in <module>
TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,
The test that you included in your README works mostly... The dependencies check out, but I get a similar set of warnings:
Tue Apr 16 12:02:56 EDT 2024 Start to find LTR candidates.
Tue Apr 16 12:02:56 EDT 2024 Identify LTR retrotransposon candidates from scratch.
Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.
Tue Apr 16 12:03:25 EDT 2024 Finish finding LTR candidates.
Tue Apr 16 12:03:25 EDT 2024 Start to find SINE candidates.
Tue Apr 16 12:04:07 EDT 2024 Warning: The SINE result file has 0 bp!
Tue Apr 16 12:04:07 EDT 2024 Start to find LINE candidates.
Tue Apr 16 12:04:07 EDT 2024 Identify LINE retrotransposon candidates from scratch.
Tue Apr 16 12:05:15 EDT 2024 Warning: The LINE result file has 0 bp!
Tue Apr 16 12:05:15 EDT 2024 Start to find TIR candidates.
Tue Apr 16 12:05:15 EDT 2024 Identify TIR candidates from scratch.
Species: others
Traceback (most recent call last):
File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in <module>
TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,
File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 81, in __init__
self.execute()
File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 121, in execute
self.execute_M4()
File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 672, in execute_M4
self["base"] = CNN_predict.execute(self)
File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 114, in execute
df = predict(df, TIRLearner_instance.genome_file_path,
File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 62, in predict
model = load_model(path_to_model)
File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model
return legacy_sm_saving_lib.load_model(
File "/usr/local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 278, in function_def_to_graph_def
input_shape = input_shape.as_proto()
AttributeError: as_proto
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /mnt/ufs18/rs-004/edgerpat_lab/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.
Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
Author: Shujun Ou ([email protected]) 10/11/2019
mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!
Tue Apr 16 12:05:31 EDT 2024 Start to find Helitron candidates.
Tue Apr 16 12:05:31 EDT 2024 Identify Helitron candidates from scratch.
Currently re-trying with a fresh Anaconda installation and conda environment.
The yml file should be helpful for conda installation. I don’t think the singularity version is working at the moment
Shujun
On Tue, Apr 16, 2024 at 3:36 PM Scott Teresi @.***> wrote:
Currently re-trying with a fresh Anaconda installation and conda environment.
— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/424#issuecomment-2059797134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NDXPHCEGJVCJ6UNAWLY5V4UFAVCNFSM6AAAAABCRVERVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJZG44TOMJTGQ . You are receiving this because you commented.Message ID: @.***>
Hi Shujun,
I got the latest version of EDTA to complete the system test, still running it on my genomes. I will report back. I had to install bedtools and samtools on top of the conda environment for this latest upgrade. I did not see those being specified in the yml file, and I was having trouble making the basic install work. Perhaps I am wrong and messed up the install, or maybe they were pre-loaded on your computing cluster system so they were missed. Either way, I hope this helps!