pySCENIC
pySCENIC copied to clipboard
Possible solutions for GRNBoost2/GENIE3 Dask issues
A recurring problem is that the GRN inference step of pySCENIC (using Arboreto's GRNBoost2/GENIE3 implementation) fails to complete successfully. This seems to be due to issues with newer Dask releases being incompatible with the existing GRNBoost2/GENIE3 implementation.
Possible errors
ValueError: Metadata mismatch found in from_delayedExpected partition of type DataFrame but got NoneTypeValueError: tuple is not allowed for map key...
Possible solutions
- In many cases using an older version of the dask/distributed packages can help to fix this. This is ideally accomplished using the Docker images, which already contain the stable versions of these packages (see here for usage details). Or, to install these via pip:
pip install dask==1.0.0 distributed'>=1.21.6,<2.0.0'
- Alternatively, some users have reported that upgrading to the newest version of Dask can resolve this as well (#147).
-
Another option is to use a helper script (arboreto_with_multiprocessing.py) that runs the Arboreto GRN algorithms (GRNBoost2, GENIE3) without Dask for compatibility. See here, or the basic usage is:
arboreto_with_multiprocessing.py \ expr_mat.loom \ allTFs_hg38.txt \ --output adj.tsv \ --num_workers 20 \
Hello @cflerin
BUG report, may be caused by Dask.
pyscenic grn {EXP_MTX_QC_FNAME} {HUMAN_TFS_FNAME} -o {ADJACENCIES_FNAME} --num_workers 16 only works at --num_workers 16. If num_workers is more than 16, whatever the cell numbers or gene numbers, GRN hangs on forever or generates an error Worker exceeded 95% memory budget. Restarting . We tested this bug in the situations that cell numbers from 2000 to 40000, CPU cores from 16 to 40, memory from 64GB to 128GB both on Mac and Windows, this bug can be reproduced.
Similar issue is here https://github.com/aertslab/pySCENIC/issues/314
Thanks!
Best,
YJ
Hi @hyjforesight , You should create a new bug report and include all of the requested info on package versions. Having this info will make it much easier to address your issue.