scenicplus
scenicplus copied to clipboard
Stuck at GSEA Step
I am following the 10X Genomics PBMC tutorial and running the wrapper function. Everything was fine until the GSEA step, it has been stuck for over 40 hours
2023-04-26 17:32:13,593 GSEA INFO Subsetting TF2G adjacencies for TF with motif.
2023-04-26 17:32:19,727 INFO worker.py:1544 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2023-04-26 17:32:20,376 GSEA INFO Running GSEA...
initializing: 23%|██▎ | 7094/31183 [23:37<05:21, 74.95it/s]
When looking at the node log, it does raise an error message about node overloaded, terminated or the network is slow. But the memory usage showing in the cluster is well below 10%
204692023-04-27 13:24:41,619 ERROR node_head.py:302 -- Cannot reach the node, c96dd5c6ab1a61bc93d4ee80eff792af1b8762ed22e5afa5eb6cbef5, after timeout 4. This node may have been overloaded, terminated, or the network is slow.20470NoneType: None204712023-04-27 13:24:48,627 ERROR node_head.py:302 -- Cannot reach the node, c96dd5c6ab1a61bc93d4ee80eff792af1b8762ed22e5afa5eb6cbef5, after timeout 4. This node may have been overloaded, terminated, or the network is slow.20472NoneType: None204732023-04-27 13:24:51,920 INFO web_log.py:206 -- 127.0.0.1 [27/Apr/2023:17:24:51 +0000] 'GET /nodes?view=summary HTTP/1.1' 200 9532 bytes 6260 us 'http://127.0.0.1:8265/' 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'204742023-04-27 13:24:51,923 INFO web_log.py:206 -- 127.0.0.1 [27/Apr/2023:17:24:51 +0000] 'GET /nodes/c96dd5c6ab1a61bc93d4ee80eff792af1b8762ed22e5afa5eb6cbef5 HTTP/1.1' 200 9871 bytes 1948 us 'http://127.0.0.1:8265/' 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'204752023-04-27 13:24:54,614 INFO web_log.py:206 -- 127.0.0.1 [27/Apr/2023:17:24:54 +0000] 'GET /log_index HTTP/1.1' 200 391 bytes 43230 us 'http://127.0.0.1:8265/' 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'204762023-04-27 13:24:55,633 ERROR node_head.py:302 -- Cannot reach the node, c96dd5c6ab1a61bc93d4ee80eff792af1b8762ed22e5afa5eb6cbef5, after timeout 4. This node may have been overloaded, terminated, or the network is slow.20477NoneType: None204782023-04-27 13:24:56,226 INFO web_log.py:206 -- 127.0.0.1 [27/Apr/2023:17:24:56 +0000] 'GET /log_proxy?url=http%3A%2F%2F127.0.0.1%3A52365%2Flogs HTTP/1.1' 200 3130 bytes 103802 us 'http://127.0.0.1:8265/' 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'204792023-04-27 13:24:58,168 INFO web_log.py:206 -- 127.0.0.1 [27/Apr/2023:17:24:58 +0000] 'GET /log_proxy?url=http%3A%2F%2F127.0.0.1%3A52365%2Flogs%2Fdashboard.err HTTP/1.1' 200 660 bytes 8014 us 'http://127.0.0.1:8265/' 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'204802023-04-27 13:25:01,210 INFO web_log.py:206 -- 127.0.0.1 [27/Apr/2023:17:25:01 +0000] 'GET /log_proxy?url=http%3A%2F%2F127.0.0.1%3A52365%2Flogs HTTP/1.1' 200 3130 bytes 5785 us 'http://127.0.0.1:8265/' 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'20481
There seem to be activities going on in the cluster based on ray dashboard, but it has been stuck at 7094/31183 and I couldn't figure out why it is taking 40h+.
Hi @li-xuyang28
hmm... 40hrs is really very long..
How many cores were you using?
Best,
Seppe
Hi,
I'm running on 8 cores (less than the 12 suggested by the tutorial), was running it locally (on an iMAC, because I was having so much trouble getting ray to work on the cluster I have access to). Somehow the entire thing was just extremely slow for me. (I restated the process but still stuck at the step hmmm)
This was the 10X PBMC multiome data, but I did change the cell type annotation a bit (divided into a bit more T cell subtypes).
Best, Yang
Hi again @SeppeDeWinter ,
I tried subsetting the object to run the build_grn function several times with the 10X PBMC data, but it all got stuck during initializing (at around 16166/18918); it takes about 4 minutes to go through the ones that were processed (consistent with the tutorial), then was forever stuck (>24h). According to ray dashboard there were still activities going on, but the nodes seemed to be idle. Is there any information I could provide to help with figuring out what happened with it?
Best, Yang
There might be a chance that one of the worker processes was crashed and that ray didn't detect it and assumes it is still running. Try with less cores or with a better machine.
Hi, is the problem solved? I also met the problem
Hi @CYorick
It might be memory related. The code in the development branch is more memory friendly.
See https://github.com/aertslab/scenicplus/discussions/202 on how to use it.
All the best,
Seppe
Hi @CYorick
It might be memory related. The code in the development branch is more memory friendly.
See #202 on how to use it.
All the best,
Seppe
Thanks for your reply. Should I simply download the Snakemake dictionary without changing anything else, and run the whole pipeline automatically? What if I just want to run the function build_grn?
Best, Yorick
The problem can be solved by setting the "ray_n_cpu" as None
What do you mean set "ray_n_cpu" as None? Using a single core?
I've tried to solve at it says , clean the temporal directory and re-run the code. But it has been impossible, and I have 600 GB of space. Here are the errors that appear me.
(_ray_run_gsea_for_e_module pid=959428) /home/roger/anaconda3/envs/scenicplus/lib/python3.8/site-packages/gseapy/algorithm.py:87: RuntimeWarning: divide by zero encountered in divide
(_ray_run_gsea_for_e_module pid=959428) norm_tag = 1.0 / sum_correl_tag
(_ray_run_gsea_for_e_module pid=959428) /home/roger/anaconda3/envs/scenicplus/lib/python3.8/site-packages/gseapy/algorithm.py:91: RuntimeWarning: invalid value encountered in multiply
(_ray_run_gsea_for_e_module pid=959428) tag_indicator * correl_vector * norm_tag - no_tag_indicator * norm_no_tag,
(raylet) Spilled 5732 MiB, 13998 objects, write throughput 2560 MiB/s. Set RAY_verbose_spill_logs=0 to disable this message.```
If anybody has encountered the same issue or could help me, would be great.
Thank you.
Hi @rogercasalsfr
I would also suggest to use the development version of the code. See https://github.com/aertslab/scenicplus/discussions/202 for more info.
All the best,
Seppe