ClinSV icon indicating copy to clipboard operation
ClinSV copied to clipboard

more processes than available cores

Open EricDeveaud opened this issue 3 years ago • 9 comments

Hello

using ClinSV-1.0.0 we noticed some potential problem when ClinSV is run through the use of taskset, batch system like slurm, etc...

clinsv have some harcoded number of process for xargs

xargs -P 15, xargs -P 12 which may be over the available procs for the process.

eg if I run ClinSV trough our standard slurm allocation (4 procs) we will have at least 15 processes that compete on 4 cores leading to some over subscription.

I dunno how to get the number of available procs for a process (not number of physical process on host) in perl like len(os.sched_getaffinity(0)))" in python or omp_get_max_threads(); in C++

it would be nice to have xargs -P value limited to the number of available procs

anyway a shell solution will be to compute the nnumber of available cpus npoc is adequat for that and replace harcoded values by the computed number something like

compute ncpus

(( ncpus = `nproc` < 15 ?  `nproc` : 15 )) 

then replace xargs -P 15 by xargs -P ${ncpus}

regards

Eric

EricDeveaud avatar Jan 24 '23 17:01 EricDeveaud

here's how I patched my copy of clinsv.

ncpus 15, 12 and 1 (not really usefull in case of 1, but anyway) allow to limit the nimber of parallel jobs when submited in a slurm allocation or in taskset context. NB I alos used $^X in place of plain perl call just to be sure to use the same interpreter.
see:https://perldoc.perl.org/perlvar#$%5EX

@@ -319,20 +316,25 @@
                open(OUT,">$$cJ{rShStem}.sh");  
                print OUT "$r_head\n\n";        
                print OUT "cd $r_OutDir/\n";
+               # avoid  overcommit. be sure to limit to available number of pro
cs
+               print OUT "(( ncpus15 = `nproc` < 15 ?  `nproc` : 15 ))\n";
+               print OUT "(( ncpus12 = `nproc` < 12 ?  `nproc` : 12 ))\n ";
+               print OUT "(( ncpus1 = `nproc` < 1 ?  `nproc` :1))\n";
                
                foreach $cT (("q0","q20","mq")){ # create bw of s1 for MQ>=0 and
 MQ>=20
                        if($cT eq "mq"){
-                               print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\
"\$2}' $refFasta.chrom.sizes | xargs -P 15 -t -i{} perl $S_SV_scriptDir/bam2wigM
Q.pl ";
+                               # use same perl than the one clinsv run with
+                               print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\
"\$2}' $refFasta.chrom.sizes | xargs -P \${ncpus15} -t -i{} $^X  $S_SV_scriptDir
/bam2wigMQ.pl ";
                                print OUT "-s 1 -r \"{}\" -o $r_TmpDir/$cSample.
$cT -f $refFasta -b $rAlnDir/$cSample.bam\n";
                                
-                               print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"
\$2}' $refFasta.chrom.sizes | xargs -P 1 -t -i{} perl $S_SV_scriptDir/bam2wigMQ.
pl ";
+                               print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"
\$2}' $refFasta.chrom.sizes | xargs -P \${ncpus1} -t -i{} $^X $S_SV_scriptDir/ba
m2wigMQ.pl ";
                                print OUT "-s 1 -r \"{}\" -o $r_TmpDir/$cSample.
$cT.small_contigs -f $refFasta -b $rAlnDir/$cSample.bam -a \n";
 
                        }else{
-                               print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\
"\$2}' $refFasta.chrom.sizes | xargs -P 15 -t -i{} perl $S_SV_scriptDir/bam2wig.
pl ";
+                               print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\
"\$2}' $refFasta.chrom.sizes | xargs -P \${ncpus15} -t -i{} $^X $S_SV_scriptDir/
bam2wig.pl ";
                                print OUT "-s 1 -q $cT -r \"{}\" -o $r_TmpDir/$c
Sample.$cT -f $refFasta -b $rAlnDir/$cSample.bam\n";
                                
-                               print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"
\$2}' $refFasta.chrom.sizes | xargs -P 1 -t -i{} perl $S_SV_scriptDir/bam2wig.pl
 ";
+                               print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"
\$2}' $refFasta.chrom.sizes | xargs -P \${ncpus1} -t -i{} $^X $S_SV_scriptDir/ba
m2wig.pl ";
                                print OUT "-s 1 -q $cT -r \"{}\" -o $r_TmpDir/$c
Sample.$cT.small_contigs -f $refFasta -b $rAlnDir/$cSample.bam -a\n";
                        }       
                        print OUT "cat $r_TmpDir/$cSample.$cT.*.wig > $r_TmpDir/$cSample.$cT.wig\n\n";

illustration.

rpm_maker://tmp > cat no_overcommit.sh
#!/bin/sh 
(( ncpus15 = `nproc` < 15 ?  `nproc` : 15 ))
(( ncpus12 = `nproc` < 12 ?  `nproc` : 12 ))
(( ncpus1 = `nproc` < 1 ?  `nproc` :1))

echo "ncpus15: ${ncpus15} - ncpus12: ${ncpus12} - ncpus1: ${ncpus1}"

and execution in various conditions

rpm_maker://tmp > nproc 
64
rpm_maker://tmp > sh no_overcommit.sh 
ncpus15: 15 - ncpus12: 12 - ncpus1: 1

now limit to 12 available procs

rpm_maker://tmp > taskset -c 0-11 sh no_overcommit.sh 
ncpus15: 12 - ncpus12: 12 - ncpus1: 1

even less

rpm_maker://tmp > taskset -c 0-3 sh no_overcommit.sh 
ncpus15: 4 - ncpus12: 4 - ncpus1: 1

regards

Eric

EricDeveaud avatar Feb 09 '23 15:02 EricDeveaud

Excellent feedback thank you Eric, we'll look to incorporate this. Can you please comment on the runtime performance with using less resources than we tested on?

Cheers

On Fri, 10 Feb 2023, 2:00 am Eric Deveaud, @.***> wrote:

here's how I patched my copy of clinsv.

ncpus 15, 12 and 1 (not really usefull in case of 1, but anyway) allow to limit the nimber of parallel jobs when submited in a slurm allocation or in taskset context. NB I alos used $^X in place of plain perl call just to be sure to use the same interpreter. see:https://perldoc.perl.org/perlvar#$%5EX

@@ -319,20 +316,25 @@ open(OUT,">$$cJ{rShStem}.sh"); print OUT "$r_head\n\n"; print OUT "cd $r_OutDir/\n";

  •           # avoid  overcommit. be sure to limit to available number of pro
    

cs

  •           print OUT "(( ncpus15 = `nproc` < 15 ?  `nproc` : 15 ))\n";
    
  •           print OUT "(( ncpus12 = `nproc` < 12 ?  `nproc` : 12 ))\n ";
    
  •           print OUT "(( ncpus1 = `nproc` < 1 ?  `nproc` :1))\n";
    
              foreach $cT (("q0","q20","mq")){ # create bw of s1 for MQ>=0 and
    

MQ>=20 if($cT eq "mq"){

  •                           print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\
    

"$2}' $refFasta.chrom.sizes | xargs -P 15 -t -i{} perl $S_SV_scriptDir/bam2wigM Q.pl ";

  •                           # use same perl than the one clinsv run with
    
  •                           print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\
    

"$2}' $refFasta.chrom.sizes | xargs -P ${ncpus15} -t -i{} $^X $S_SV_scriptDir /bam2wigMQ.pl "; print OUT "-s 1 -r "{}" -o $r_TmpDir/$cSample. $cT -f $refFasta -b $rAlnDir/$cSample.bam\n";

  •                           print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"
    

$2}' $refFasta.chrom.sizes | xargs -P 1 -t -i{} perl $S_SV_scriptDir/bam2wigMQ. pl ";

  •                           print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"
    

$2}' $refFasta.chrom.sizes | xargs -P ${ncpus1} -t -i{} $^X $S_SV_scriptDir/ba m2wigMQ.pl "; print OUT "-s 1 -r "{}" -o $r_TmpDir/$cSample. $cT.small_contigs -f $refFasta -b $rAlnDir/$cSample.bam -a \n";

                    }else{
  •                           print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\
    

"$2}' $refFasta.chrom.sizes | xargs -P 15 -t -i{} perl $S_SV_scriptDir/bam2wig. pl ";

  •                           print OUT "\n\nawk '\$2>=100000 {print \$1\":1-\
    

"$2}' $refFasta.chrom.sizes | xargs -P ${ncpus15} -t -i{} $^X $S_SV_scriptDir/bam2wig.pl "; print OUT "-s 1 -q $cT -r "{}" -o $r_TmpDir/$c Sample.$cT -f $refFasta -b $rAlnDir/$cSample.bam\n";

  •                           print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"
    

$2}' $refFasta.chrom.sizes | xargs -P 1 -t -i{} perl $S_SV_scriptDir/bam2wig.pl ";

  •                           print OUT "\n\nawk '\$2<100000 {print \$1\":1-\"
    

$2}' $refFasta.chrom.sizes | xargs -P ${ncpus1} -t -i{} $^X $S_SV_scriptDir/bam2wig.pl "; print OUT "-s 1 -q $cT -r "{}" -o $r_TmpDir/$c Sample.$cT.small_contigs -f $refFasta -b $rAlnDir/$cSample.bam -a\n"; } print OUT "cat $r_TmpDir/$cSample.$cT.*.wig > $r_TmpDir/$cSample.$cT.wig\n\n";

illustration.

rpm_maker://tmp > cat no_overcommit.sh #!/bin/sh (( ncpus15 = nproc < 15 ? nproc : 15 )) (( ncpus12 = nproc < 12 ? nproc : 12 )) (( ncpus1 = nproc < 1 ? nproc :1))

echo "ncpus15: ${ncpus15} - ncpus12: ${ncpus12} - ncpus1: ${ncpus1}"

and execution in various conditions

rpm_maker://tmp > nproc 64 rpm_maker://tmp > sh no_overcommit.sh ncpus15: 15 - ncpus12: 12 - ncpus1: 1

now limit to 12 available procs

rpm_maker://tmp > taskset -c 0-11 sh no_overcommit.sh ncpus15: 12 - ncpus12: 12 - ncpus1: 1

even less

rpm_maker://tmp > taskset -c 0-3 sh no_overcommit.sh ncpus15: 4 - ncpus12: 4 - ncpus1: 1

regards

Eric

— Reply to this email directly, view it on GitHub https://github.com/KCCG/ClinSV/issues/42#issuecomment-1424325610, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQQM2YRUC5Z2G2IJYSW4DWWUBADANCNFSM6AAAAAAUFMGTTU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

drmjc avatar Feb 09 '23 22:02 drmjc

as testing with NA12878_b38.bam is a long time consumer I did not tested with less that the requested ressources. I ran it using plain 64 CPUs ie axrgs -P 15, -P 12 as initialy so sorry I won't be abble to comment on performances

it would be nice to have a small dataset to test

EricDeveaud avatar Feb 10 '23 09:02 EricDeveaud

Agreed, I think @j-bradlee has a small test dataset... James?

On Fri, 10 Feb 2023, 8:56 pm Eric Deveaud, @.***> wrote:

as testing with NA12878_b38.bam is a long time consumer I did not tested with less that the requested ressources. I ran it using plain 64 CPUs ie axrgs -P 15, -P 12 as initialy so sorry I won't be abble to comment on performances

it would be nice to have a small dataset to test

— Reply to this email directly, view it on GitHub https://github.com/KCCG/ClinSV/issues/42#issuecomment-1425539700, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQQM5IC4H3ZRV6WYDPFCLWWYGF7ANCNFSM6AAAAAAUFMGTTU . You are receiving this because you commented.Message ID: @.***>

drmjc avatar Feb 11 '23 04:02 drmjc

Heres the subsampled b38 bai file: https://drive.google.com/file/d/1OiFZA4z-k2eTNOYLktHflsI5pGwaijvI/view?usp=share_link Heres the subsampled b38 bam file: https://drive.google.com/file/d/1ohArdn1l2etlQunD8gI1TzHyPZWLRMuT/view?usp=share_link

J-Bradlee avatar Feb 13 '23 12:02 J-Bradlee

thanks,

as soon as Ican. get my hands on the my computer I'll try with this ones.

stay tunned

EricDeveaud avatar Feb 13 '23 12:02 EricDeveaud

maybee you can chnage bam file permissions to Anyone with the link. currently I can't get it from the cluster (different id) using gdown

EricDeveaud avatar Feb 14 '23 08:02 EricDeveaud

Apologies I thought I had set it to anyone with link. Try again now

J-Bradlee avatar Feb 14 '23 09:02 J-Bradlee

it's OK now thanks

EricDeveaud avatar Feb 14 '23 09:02 EricDeveaud