EarlGrey
EarlGrey copied to clipboard
RepeatModeler always runs with 1 thread, does not reuse earlier round results
I'm running EarlGrey on my chromosome-level plant assembly with a size of 890Mbp. Estimated repeat content from the first few runs of RepeatModeler is around 60%.
I am using the latest singularity version built with DFam 3.7, and executing on a slurm cluster.
My problem is two-fold-- first, my cluster has a job time limit of 5 days. At five days, RepeatModeler is halfway through round 5. When I restart the run, RepeatModeler starts over at round 1! When I restart, I see two unique RepeatModeler folders created in the RepeatModeler folder. Is there a way to make RepeatModeler reuse results from previous runs? That's the only way I can get the analysis to finish given a 5-day job time limit.
My other problem is that RepeatModeler only ever says it is running with a single thread (see below). Does this just apply for the very first stage, and it correctly utilizes the supplied cores for the rest? I am not sure if it makes sense that a highly contiguous genome of this size would take this long to run, so I want to make sure I'm not missing something.
Round 4, on 130Mbp out of 890Mbp (14.6%), took 21 hours on 64 cores with 512GB memory.
RepeatModeler Version 2.0.5
===========================
Using output directory = /workdir/earlgrey/m_canadense_EarlGrey/m_canadense_RepeatModeler/RM_352667.MonAug120602292024
Search Engine = rmblast 2.14.1+
Threads = 1
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.5
LTR Structural Analysis: Disabled [use -LTRStruct to enable]
Random Number Seed: 1723456948
Database = /workdir/earlgrey/m_canadense_EarlGrey/m_canadense_Database/m_canadense .
- Sequences = 26
- Bases = 890548923
Storage Throughput = fair ( 449.61 MB/s )
My cluster submission command is:
module load singularity
cd /work/wenglab/playground/matthew/genome
sbatch -p long -t 5-0 -n 64 --mem=512gb --wrap \
"singularity run -B $(pwd):/workdir /work/wenglab/testtube/matthew/singularity/earlgrey_latest.sif \
earlGrey -g /workdir/genome_cs.unmasked.fasta -s m_canadense -o /workdir/earlgrey -m -t 64"