pySCENIC
pySCENIC copied to clipboard
a question about parallel computing [results]
Thanks for your recent update !
I have a question about the parallel computing about pySCENIC.
Recently, I'm using the singularity to run the image "aertslab-pyscenic-0.12.0.sif". In the first step of GRN, I could control the parallel computing by set the --num_workers. However, when I set the --num_workers as 24 or other number, I found that only three or more less tasks were running.
$ top
top - 18:33:27 up 3:29, 1 user, load average: 25.84, 26.06, 26.40
tasks: 1509 total, 3 running, 1505 sleeping, 0 stopped, 1 zombie
%Cpu(s): 19.6 us, 0.3 sy, 0.0 ni, 79.9 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 773867.1 total, 663837.1 free, 102458.2 used, 7571.8 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 667016.0 avail Mem
进程号 USER PR NI VIRT RES SHR %CPU %MEM TIME+ COMMAND
18932 bio 20 0 3320624 2.3g 69000 S 103.6 0.3 58:44.14 python
18977 bio 20 0 3771860 2.7g 68964 S 103.6 0.4 58:38.53 python
18938 bio 20 0 3742348 2.7g 69664 S 103.3 0.4 58:41.20 python
18943 bio 20 0 3840500 2.8g 69652 S 103.3 0.4 58:45.79 python
18959 bio 20 0 4055996 3.0g 69364 S 103.3 0.4 58:31.26 python
...
So, I want to konw the reasons of limitation. Could please give me some advice?
My linux cluster is like
$ cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l
2
$ cat /proc/cpuinfo |grep "processor"|wc -l
128
$ cat /proc/cpuinfo |grep "cores"|uniq
cpu cores : 32
Could you please tell me which characteristics of machine will limit the parallel computing ?
Hi @Shiywa
Is this the case for the entire run or only specific sections?
Best,
Seppe
now, I just run GRN in my cluster. So, I don't know whether other steps will run like this.
Well, when I was running the CTX, I found that the step could run parallelly.
singularity run aertslab-pyscenic-0.12.0.sif pyscenic ctx HPV16_CC_output.tsv hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather --annotations_fname motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl --mode dask_multiprocessing --num_workers 100 --output regulons.csv --expression_mtx_fname HPV16_CC1_count2.csv
however, I set the num_workers as 100, but the highest number of parallel tasks was 36 in my eyes. Is it normal?
I think what happened is that you happened to monitor the progress of the GRN step at a time where it did not need more than 4 cores (i.e. there were only 4 more tasks to complete so it does not make sense to use more cores in that case). That's why I asked wether it was using 4 cores for the entire time the GRN step was running. Chances are high that it was using more than 4 cores at some other time.
I can not be sure of this however.
Well, I kept monitoring the tasks in step GRN using top, but the number of the running tasks was really limited under maybe 6. By top, i could find a huge amount of tasks python with the state STOP, while only a few tasks kept running.
Actually, I want to ask which reason is the limitation of parallelly computing. I found that you mentioned "cores". My cluster has two physical cores, 32 logical cores and 128 threads. Could you please tell me which one is associated with the parameter num_workers?
Regards!
num_workers is the number of threads that are spawned.