pySCENIC icon indicating copy to clipboard operation
pySCENIC copied to clipboard

'gene_names' is not defined ERROR running arboreto_with_multiprocessing.py

Open lindaboshans opened this issue 2 years ago • 6 comments

Hello,

I was encountering dask issues while running pyscenic grn, and so I switched over to using arboreto_with_multiprocessing.py as suggested in previous posts. However, I am running into an error of gene_names not being defined. This was mentioned here previously but never resolved. Thus, I am not sure how to proceed. My loom file is in the correct structure with genes x cells. I tried running it using jupyter notebook, Docker, and CLI, receiving the same errors in all three.

Code: !arboreto_with_multiprocessing.py day7PD.loom all_TFs.txt
--method grnboost2
--output adj.tsv
--num_workers 4
--seed 777

Error:

Loaded expression matrix of 15727 cells and 24014 genes in 11.148365020751953 seconds... Loaded 1797 TFs... starting grnboost2 using 4 processes... 0%| | 0/24014 [00:10<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/[email protected]/3.9.1_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/local/bin/arboreto_with_multiprocessing.py", line 101, in run_infer_partial_network target_gene_name = gene_names[target_gene_index] NameError: name 'gene_names' is not defined """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/bin/arboreto_with_multiprocessing.py", line 147, in adjs = list( File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/local/Cellar/[email protected]/3.9.1_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 870, in next raise value File "/usr/local/Cellar/[email protected]/3.9.1_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/local/bin/arboreto_with_multiprocessing.py", line 101, in run_infer_partial_network target_gene_name = gene_names[target_gene_index] NameError: name 'gene_names' is not defined

lindaboshans avatar Sep 02 '22 04:09 lindaboshans

@lindaseong Could you try in a new virtual environment and installing pySCENIC from scratch?

The gene_names variable is defined in main, so it shouldn't be undefined:

❯ arboreto_with_multiprocessing.py ${f_loom_path_scenic} ${f_tfs} \
∙         --method grnboost2 \
∙         --output adj.tsv \
∙         --num_workers 4 \
∙         --seed 777
Loaded expression matrix of 10280 cells and 20292 genes in 4.658531665802002 seconds...
Loaded 1797 TFs...
starting grnboost2 using 4 processes...
  0%|▍                                                                                                                                                                                                                                     | 38/20292 [00:35<7:55:12,  1.41s/it]

ghuls avatar Sep 15 '22 12:09 ghuls

Are you running it on MacOSby any chance? We run pySCENIC exclusively on Linux (x86_64), so it is possible that it doesn't work on MacOS.

ghuls avatar Sep 15 '22 12:09 ghuls

It should be fixed in master: https://github.com/aertslab/pySCENIC/commit/692132a097cd9533524e943cc67495231242c960

ghuls avatar Sep 15 '22 14:09 ghuls

Same issue, could you please tell me how you fix it @lindaseong @ghuls ? My pyscenic is version 0.12.0. I guess I should install the master release which was pulled on 9/15? Could you please tell me how to install the newest master release @ghuls? Thanks!

hyjforesight avatar Oct 21 '22 05:10 hyjforesight

Hello @ghuls NameError: name 'gene_names' is not defined happens when arboreto_with_multiprocessing.py is running in the Jupyter notebook in Windows. Is it able to fix this bug for Windows? I switched to Ubuntu v20.04 with the installation of the newest master release of pySCENIC. The issue is that, it requires running more than 17 years! Is it able to reduce the running time?

git clone https://github.com/aertslab/pySCENIC.git
cd pySCENIC/
pip install .
cd pySCENIC/
arboreto_with_multiprocessing.py /home/hyjforesight/Mucinous_filtered_for_scenic.loom /home/hyjforesight/allTFs_hg38.txt -m grnboost2 -o /root/adj.tsv --num_workers 16
Loaded expression matrix of 20377 cells and 22062 genes in 5.994802713394165 seconds...
Loaded 1892 TFs...
starting grnboost2 using 16 processes...
  0%|                                         | 1/22062 [7:12:28<159014:37:42, 25948.63s/it]

Thanks! Best, Yuanjian

hyjforesight avatar Oct 24 '22 18:10 hyjforesight

Hi @hyjforesight and @lindaseong

Today a new version of pyscenic was released to pip. Could you try again with the newest version?

Best,

Seppe

SeppeDeWinter avatar Nov 21 '22 14:11 SeppeDeWinter