grnboost2 TypeError: Must supply at least one delayed object
Hi!
GRNBoost2 produces an error at the very last step. The same happens when I use GENIE3. It seems to be a problem with Dask, however, I could not figure out what is going on.
The code:
import os
import pandas as pd
from distributed import Client, LocalCluster
from arboreto.algo import grnboost2, genie3
from arboreto.utils import load_tf_names
in_file= '/Users/annasve/Desktop/data/transcriptomics/output/PyWGCNA/NBC_00001/log_tpm.csv'
tf_file = '/Users/annasve/Desktop/data/transcriptomics/output/arboreto/output/NBC_00001/tf_list.csv'
ex_matrix = pd.read_csv(in_file, index_col = 0)
tf_names = load_tf_names(tf_file)
network = grnboost2(expression_data=ex_matrix, tf_names=tf_names, verbose = True)
The error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[15], line 9
6 tf_names = load_tf_names(tf_file)
8 # Run GRNBoost2 with explicitly provided gene_names and tf_names
----> 9 network = grnboost2(expression_data=ex_matrix, tf_names=tf_names, verbose = True)
11 network.to_csv(out_file)
File ~/anaconda3/envs/arboreto/lib/python3.11/site-packages/arboreto/algo.py:39, in grnboost2(expression_data, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
10 def grnboost2(expression_data,
11 gene_names=None,
12 tf_names='all',
(...)
16 seed=None,
17 verbose=False):
18 """
19 Launch arboreto with [GRNBoost2] profile.
20
(...)
36 :return: a pandas DataFrame['TF', 'target', 'importance'] representing the inferred gene regulatory links.
37 """
---> 39 return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS,
40 gene_names=gene_names, tf_names=tf_names, client_or_address=client_or_address,
41 early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose)
File ~/anaconda3/envs/arboreto/lib/python3.11/site-packages/arboreto/algo.py:120, in diy(expression_data, regressor_type, regressor_kwargs, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
117 if verbose:
118 print('creating dask graph')
--> 120 graph = create_graph(expression_matrix,
121 gene_names,
122 tf_names,
123 client=client,
124 regressor_type=regressor_type,
125 regressor_kwargs=regressor_kwargs,
126 early_stop_window_length=early_stop_window_length,
127 limit=limit,
128 seed=seed)
130 if verbose:
131 print('{} partitions'.format(graph.npartitions))
File ~/anaconda3/envs/arboreto/lib/python3.11/site-packages/arboreto/core.py:450, in create_graph(expression_matrix, gene_names, tf_names, regressor_type, regressor_kwargs, client, target_genes, limit, include_meta, early_stop_window_length, repartition_multiplier, seed)
448 # gather the DataFrames into one distributed DataFrame
449 all_links_df = from_delayed(delayed_link_dfs, meta=_GRN_SCHEMA)
--> 450 all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA)
452 # optionally limit the number of resulting regulatory links, descending by top importance
453 if limit:
File ~/anaconda3/envs/arboreto/lib/python3.11/site-packages/dask_expr/io/_delayed.py:115, in from_delayed(dfs, meta, divisions, prefix, verify_meta)
112 dfs = [dfs]
114 if len(dfs) == 0:
--> 115 raise TypeError("Must supply at least one delayed object")
117 if meta is None:
118 meta = delayed(make_meta)(dfs[0]).compute()
TypeError: Must supply at least one delayed object
Hi,
I ran into the same issue recently while trying to run grnboost2 from a Python 3.12 conda environment with the default versions of dask and distributed.
I found a thread with the same bug on pySCENIC GitHub Issues: #561. It appears that this is caused by some recent (?) changes in dask/distributed packages. The proposed fix in the thread suggests installing the following versions: dask-expr==0.5.3 distributed==2024.2.1. I tried doing so in a Python 3.12 environment, but that led to the same error as you have reported in the pySCENIC thread.
I have then tried to rebuild the environment with Python 3.10.15 and dask-expr==0.5.3 distributed==2024.2.1. This change resulted in the code running properly to completion. Hopefully this can be of help to other users who encounter the same issue.
tl;dr: Python 3.10.15 + dask-expr==0.5.3 distributed==2024.2.1 works fine, newer versions of Python, dask, distributed lead to the bug above.
Hi,
I ran into the same issue recently while trying to run
grnboost2from a Python 3.12condaenvironment with the default versions ofdaskanddistributed.I found a thread with the same bug on pySCENIC GitHub Issues: #561. It appears that this is caused by some recent (?) changes in
dask/distributedpackages. The proposed fix in the thread suggests installing the following versions:dask-expr==0.5.3 distributed==2024.2.1. I tried doing so in a Python 3.12 environment, but that led to the same error as you have reported in the pySCENIC thread.I have then tried to rebuild the environment with Python 3.10.15 and
dask-expr==0.5.3 distributed==2024.2.1. This change resulted in the code running properly to completion. Hopefully this can be of help to other users who encounter the same issue.tl;dr: Python 3.10.15 +
dask-expr==0.5.3 distributed==2024.2.1works fine, newer versions of Python,dask,distributedlead to the bug above.
I also encountered the same problem. According to your suggestion, I changed from 3.12.3 to 3.10.15, but I found that the same error still occurred. Would it be different using conda to create an environment (conda create -n pyscenic python=3.10.15) instead of deleting python 3.12.3 and reinstalling Python 3.10.15?
This error is caused by the source code create_graph of arboreto.core and have been fixed in github. However, it has not been updated into 0.1.6. You can modify the code in the package 0.1.6 like this: https://github.com/aertslab/pySCENIC/issues/592#issuecomment-2567718783