MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

Some jobs failed when run several at the same time

Open HeloiseMuller opened this issue 1 year ago • 2 comments

Expected Behavior

Run mmseq search with an array of jobs.

Current Behavior

As a test, I began with an array of 5 jobs only. 2 of them failed with a different error message. When I run them alone, they work. This behaviour is similar to the issue #239

Steps to Reproduce (for bugs)

sarray -J mmseq --mail-type=ARRAY_TASKS,FAIL commandMMseqs --%=5 where commandMMseqs contains: sbatch command_mmseq2_model.sbatch GCA_018105865.1 GCA_901001135.2 sbatch command_mmseq2_model.sbatch GCA_009193005.1 GCA_901001135.2 sbatch command_mmseq2_model.sbatch GCA_905160935.1 GCA_901001135.2 sbatch command_mmseq2_model.sbatch GCA_019095985.1 GCA_901001135.2 sbatch command_mmseq2_model.sbatch GCA_001703475.1 GCA_901001135.2

command_mmseq2_model.sbatch contains:

#!/bin/bash
#
#SBATCH -N 1                         # nombre de nœuds
#SBATCH -c 20                         # nombre de cœurs sur ce meme noeud
#SBATCH --mem 50G                    # mémoire vive pour l'ensemble des cœurs
#SBATCH -J mmseq

module load system/Miniconda3-4.7.10 
module load bioinfo/mmseqs2-v13.45111

mmseqs search copies/${1}.TEs.fasta.dbm copies/${2}.TEs.fasta.dbm mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out tmp -s 5.7 --search-type 3 --threads 20 --max-seqs 50 
mmseqs filterdb mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out.bestHit --extract-lines 1 
mmseqs convertalis copies/${1}.TEs.fasta.dbm copies/${2}.TEs.fasta.dbm mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out.bestHit mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out.bestHit.tab
rm mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out.*[0-9]* &
awk '{if ($3>=0.75 && $4>=300 && $12>=200) print $0}' mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out.bestHit.tab > mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out.bestHit.tab.filtered
rm mapCopies/mmseq2_${1}_vs_${2}_evalue-sDefault-maxSeq50.out.bestHit.tab

MMseqs Output (for bugs)

One job fails with Could not delete /work/jpeccoud/HeloiseMuller/tmp/latest! Another job fails with Could not create symlink of tmp/14012808946536109652!

Context

I suppose some jobs try to overwrite others in tmp, as for issue #239? Since you were able to fix it for mmseq rbh, I though it should be fixable for mmseq search too?

HeloiseMuller avatar Sep 19 '22 08:09 HeloiseMuller

Try giving every job an unique tmp folder (e.g., tmp_${SLURM_JOB_ID}_${SLURM_ARRAY_TASK_ID}).

MMseqs2 also doesn't have a good way to give it a total memory limit, you can approximate a memory limit with --split-memory-limit. This should be about 80% of the memory you want MMseqs2 to use (in your case about 40GB, so --split-memory-limit 40G). This is relevant if other jobs are running on the same node too, as MMseqs2 will generally try to use all available memory.,

milot-mirdita avatar Sep 19 '22 09:09 milot-mirdita

It works like that, thank you for you fast reply!

HeloiseMuller avatar Sep 19 '22 14:09 HeloiseMuller