EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

timeout -s KILL 300s and Error: Error while loading sequence

Open gforg34 opened this issue 2 months ago • 3 comments

Hi, and thank you for this great software. I’m working with a plant genome and I’ve encountered some issues related to EDTA_raw.pl and LTR_retriever. I have the following error:

Thu Oct  9 00:19:50 CEST 2025   Identify LTR retrotransposon candidates from scratch.

sh: line 1: 280369 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub20 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub20.harvest.scn 2> /dev/null
sh: line 1: 280601 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub60 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub60.harvest.scn 2> /dev/null
sh: line 1: 281524 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub170 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub170.harvest.scn 2> /dev/null
sh: line 1: 282764 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub323 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub323.harvest.scn 2> /dev/null
sh: line 1: 283042 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub360 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub360.harvest.scn 2> /dev/null
sh: line 1: 283150 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub372 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub372.harvest.scn 2> /dev/null
sh: line 1: 283315 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub391 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub391.harvest.scn 2> /dev/null
sh: line 1: 283960 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub473 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub473.harvest.scn 2> /dev/null
sh: line 1: 286031 Killed                  timeout -s KILL 300s /home/kgidiotis/.conda/envs/EDTA1/bin/gt ltrharvest -index chromosome_sub729 -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes > chromosome_sub729.harvest.scn 2> /dev/null

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
        Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
        Author: Shujun Ou ([email protected]) 10/11/2019

I have downloaded EDTA via conda in a new environment. Here is my versions of EDTA and ltr_retriever: ltr_retriever 3.0.4 hdfd78af_0 bioconda edta 2.2.2 hdfd78af_1 bioconda

So first of all this is my script (run un SLURM):

EDTA_raw.pl \
    --genome $chromosome\
    --species others \
    --rmlib $rmlib \
    --type ltr \
    --threads 40 \
    --overwrite 1

As you can tell I dont run EDTA against the entire genome but chromosome, specificaly wheat (~800Mbp long) Also this is the memory that I used: #SBATCH -n 20 (also increased that to 40 in one of my launches #SBATCH --mem 180GB which I believe is more than enough for a chromosome less than 1Gb.

I started with a memory of 50Gb and by following some of the issues like #575. Moreover, according to this #188 I have checked the following files .defalse, I have either pass, false or truncated entries, not only false, so probably this means that some LTRs are mapped. Furthermore, I checked the EDTA_raw script and I dont find any timeout -s KILL 300s as the #564 suggests to change it. I dont know if I am looking at the wrong script.

The thing is that I have both the errors timeout -s KILL 300s and Error: Error while loading sequence, and I know you suggest that the first one could be ignored but I dont know if this is related to my final outputs where instead of the three major outputs LTR.intact.fa, LTR.raw.fa, LTR.intact.gff, there is not LTR.intact.fa. Please any help will be appreciated.

Mini update: I realized that the scripts to adjust the timeout -s KILL 300s are LTR_HARVEST and LTR_FINDER, am I right on this ? This line of command: my $timeout = 120; #set maximum time for a thread to run. After $timeout seconds, the child thread is killed. Do you think if I tried this will solve the issue?

Regards, Kostas

gforg34 avatar Oct 09 '25 09:10 gforg34

Same problem, although the species in my task has a large genome (>20 Gb).

Edit: I also found the following lines "my $timeout = 120; #set maximum time for a thread to run. After $timeout seconds, the child thread is killed." in the scripts of LTR_FINDER_parallel and LTR_HARVEST_parallel. But I'm not sure whether editing these lines would solve the above issue.

sakusankun avatar Oct 10 '25 16:10 sakusankun

Yes you may increase the timeout value to allow the subprocess to run for longer times, so does the program execution time. Having a few such timeout kills is OK because the script has a recycling function, which will rerun on those terminated regions, so that's not an issue.

Shujun

oushujun avatar Oct 11 '25 13:10 oushujun

Thank you for your reply Shujun,

I managed to overcome the Sequence Error by using only one library each time. My original library was a merged dataset combining a classified TREP database with an unclassified one. However, when I run the process with both libraries together, the same error reappears. Could this issue be caused by the unclassified library or else? Interestingly, when I run the unclassified library alone, the error does not occur.

I also noticed that in my single runs with each library, the file chr2B.fasta.mod.LTR.intact.raw.fa contains only two LTR elements, even though the corresponding GFF file reports many more identified LTRs. Do you know why this happens? My chromosome is relatively large (~800 Mbp), which can also be seen from the file sizes. For the timeout warning, I decided to leave it as is. Single runs ouputs: -rw-r--r--. 1 kgidiotis g_crag 17M Oct 11 09:06 chr2B.fasta.mod.LTR.raw.fa -rw-r--r--. 1 kgidiotis g_crag 6.3K Oct 11 09:06 chr2B.fasta.mod.LTR.intact.raw.fa -rw-r--r--. 1 kgidiotis g_crag 19M Oct 11 09:06 chr2B.fasta.mod.LTR.intact.raw.gff3

gforg34 avatar Oct 12 '25 21:10 gforg34