EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

SLURM-specific behavior: repeat content dramatically reduced but no errors

Open laramiemckenna opened this issue 9 months ago • 12 comments

Our cluster moved from an LSF to a SLURM workload manager this year. I really enjoy using EDTA for our genome projects and ran it on a couple of assemblies while still working under LSF. Our total repeat content, as expected, was always in the same range as previous estimates from short-read data.

After the switch, I decided to re-run it on those same assemblies after making some small changes (fixing some minor SVs here and there) -- same install, same version as before, same script, same number of resources. But, all of a sudden, the total repeat content was dropping substantially and I can't find any indication as to why in the log files or .err/.out files. This pattern has held for multiple genomes across multiple accounts regardless of version or install type. Our IT Department looked into it and said it was likely something to do with how EDTA uses resources in SLURM, but could not find the root of the problem either. It was especially confusing that EDTA wasn't using all of the resources allocated to it.

I don't know how to address this or begin troubleshooting it. Do you have any idea what might be causing this behavior?

I've placed some examples from one of the genomes I'm working on below if it helps. Again, there's nothing in the .log, .err. or .out files -- according to those, it looks like the job was completed successfully.

For this genome, the expected total repeat content is ~30-33% based on previous runs of EDTA and estimates with GenomeScope.

This was the first attempt using the same script and install as it was prior to the LSF/SLURM switch (values in this range of 6-7% also occurred if I used the same resources, switched --ntasks to -c, and used Singularity instead):

#!/bin/sh
#SBATCH -e edta_test_%j.err
#SBATCH -o edta_test_%j.out
#SBATCH --job-name=edta_test
#SBATCH --time-min=120:00:00
#SBATCH --ntasks=25
#SBATCH --mem=80G
#SBATCH --partition=plant
#SBATCH --nodes=1

perl ~/mambaforge/envs/edta/bin/EDTA.pl --genome hap2_curated.FINAL.fasta --species others --anno 1 -t 25

and this is the SLURM output:

Job     	1312854 (COMPLETED)
Name    	edta_test
Submit  	sbatch edta.sh
Nodes   	plant - plant02
Input   	/dev/null
Output  	[path to]/edta_test_1312854.out
Error   	[path to]/edta_test_1312854.err
Resources 	CPU = 25 Memory = 81920
Start   	2023-08-01 13:37:40
End     	2023-08-01 18:03:34
Elapsed 	265.9 minutes
Limit   	28800 minutes
Exit Code   	SUCCESS (0)

Usage:
min       	CPU = 89437.26 sec (1 day, 0:50:37.26, 22.42 %)
min       	Mem = 13133.449 MB (16.03 %)
max       	CPU = 89437.26 sec (1 day, 0:50:37.26, 22.42 %)
max       	Mem = 13133.449 MB (16.03 %)
average       	CPU = 89437.26 sec (1 day, 0:50:37.26, 22.42 %)
average       	Mem = 13133.449 MB (16.03 %)
total       	CPU = 89437.26 sec (1 day, 0:50:37.26, 22.42 %)
total       	Mem = 13133.449 MB (16.03 %)

and here's the EDTA output

Repeat Classes
==============
Total Sequences: 9
Total Length: 298741932 bp
Class                  Count        bpMasked    %masked
=====                  =====        ========     =======
LTR                    --           --           --
    Copia              5290         7521628	 2.52%
    Gypsy              2279         2886915	 0.97%
    unknown            2123         1271613	 0.43%
TIR                    --           --           --
    CACTA              4085         2304875	 0.77%
    Mutator            6566         2950914	 0.99%
    PIF_Harbinger      1010         424154	 0.14%
    Tc1_Mariner        144          83628        0.03%
    hAT                3496         2342891	 0.78%
nonTIR                 --           --           --
    helitron           1595         830065	 0.28%
                      ---------------------------------
    total interspersed 26588        20616683     6.90%

---------------------------------------------------------
Total                  26588        20616683     6.90%

Even though it wasn't using all of the memory provided to it, I wondered if it was a matter of resource allocation, so after slowly increasing it (especially the number of tasks-per-cpu), I was able to reproduce the total repeat content and ratios I expected with this run, but scaling the resources similarly for other larger genomes did not work:

#!/bin/sh
#SBATCH -e edta_singularity_%j.err
#SBATCH -o edta_singularity_%j.out
#SBATCH --job-name=edta_singularity
#SBATCH --time-min=120:00:00
#SBATCH -c 100
#SBATCH --mem=300G
#SBATCH --partition=plant
#SBATCH --nodes=1

module load cluster/singularity/3.11.0

export PYTHONNOUSERSITE=1

singularity exec [path to]/EDTA.sif EDTA.pl --genome hap2_curated.FINAL.fasta --anno 1

Here's the SLURM job output (again, not actually using much of the resources allocated):

Job     	1322805 (COMPLETED)
Name    	edta_singularity
Submit  	sbatch edta.sh
Nodes   	plant - plant01
Input   	/dev/null
Output  	[path to]/edta_singularity_1322805.out
Error   	[path to]/edta_singularity_1322805.err
Resources 	CPU = 100 Memory = 307200
Start   	2023-08-04 11:47:48
End     	2023-08-04 19:17:14
Elapsed 	449.43 minutes
Limit   	28800 minutes
Exit Code   	SUCCESS (0)

Usage:
min       	CPU = 60512.09 sec (16:48:32.09, 2.24 %)
min       	Mem = 12943.504 MB (4.21 %)
max       	CPU = 60512.09 sec (16:48:32.09, 2.24 %)
max       	Mem = 12943.504 MB (4.21 %)
average       	CPU = 60512.09 sec (16:48:32.09, 2.24 %)
average       	Mem = 12943.504 MB (4.21 %)
total       	CPU = 60512.09 sec (16:48:32.09, 2.24 %)
total       	Mem = 12943.504 MB (4.21 %)

And finally, the EDTA .sum file output:

Repeat Classes
==============
Total Sequences: 9
Total Length: 298741932 bp
Class                  Count        bpMasked    %masked
=====                  =====        ========     =======
LTR                    --           --           --
    Copia              35182        32118649     10.75%
    Gypsy              19775        17604895     5.89%
    unknown            13685        5906158	 1.98%
TIR                    --           --           --
    CACTA              27395        11192622     3.75%
    Mutator            42888        14389429     4.82%
    PIF_Harbinger      6605         2014056	 0.67%
    Tc1_Mariner        763          218663	 0.07%
    hAT                22646        10385771     3.48%
nonTIR                 --           --           --
    helitron           11927        3650322	 1.22%
                      ---------------------------------
    total interspersed 180866       97480565     32.63%

---------------------------------------------------------
Total                  180866       97480565     32.63%

laramiemckenna avatar Sep 26 '23 12:09 laramiemckenna