GLUE
GLUE copied to clipboard
error in calculating biadjacency_matrix
Hi I am following pipeline from the GLUE publication to get the regulatory inference. In that attempt, I am at /GLUE/tree/master/experiments/RegInf)/s03_peak_gene_validation.py However I get following error
========================================================== Clustering metacells...
ValueError Traceback (most recent call last)
/tmp/ipykernel_3936564/4102559450.py in
~/utils.py in metacell_corr(rna, atac, use_rep, n_meta, skeleton, method) 26 def metacell_corr(rna, atac, use_rep, n_meta=200, skeleton=None, method="spr"): 27 print("Clustering metacells...") ---> 28 rna_agg, atac_agg = get_metacells_paired(rna, atac, use_rep, n_meta=n_meta) 29 print("Computing correlation...") 30 return _metacell_corr(rna_agg, atac_agg, skeleton=skeleton, method=method)
~/utils.py in get_metacells_paired(rna, atac, use_rep, n_meta) 15 kmeans.train(rna.obsm[use_rep]) 16 _, rna.obs["metacell"] = kmeans.index.search(rna.obsm[use_rep], 1) ---> 17 atac.obs["metacell"] = rna.obs["metacell"].to_numpy() 18 rna_agg = scglue.data.aggregate_obs(rna, "metacell") 19 atac_agg = scglue.data.aggregate_obs(atac, "metacell")
~/miniconda3/envs/mypython3/lib/python3.7/site-packages/pandas/core/frame.py in setitem(self, key, value) 3610 else: 3611 # set column -> 3612 self._set_item(key, value) 3613 3614 def _setitem_slice(self, key: slice, value):
~/miniconda3/envs/mypython3/lib/python3.7/site-packages/pandas/core/frame.py in _set_item(self, key, value) 3782 ensure homogeneity. 3783 """ -> 3784 value = self._sanitize_column(value) 3785 3786 if (
~/miniconda3/envs/mypython3/lib/python3.7/site-packages/pandas/core/frame.py in _sanitize_column(self, value) 4507 4508 if is_list_like(value): -> 4509 com.require_length_match(value, self.index) 4510 return sanitize_array(value, self.index, copy=True, allow_2d=True) 4511
~/miniconda3/envs/mypython3/lib/python3.7/site-packages/pandas/core/common.py in require_length_match(data, index) 530 if len(data) != len(index): 531 raise ValueError( --> 532 "Length of values " 533 f"({len(data)}) " 534 "does not match length of index "
ValueError: Length of values (80789) does not match length of index (73872)
Does 'rna' and 'atac' use_rep need to be the same size?
Can someone please explain the error
Thanks for the report!
The scripts in /GLUE/tree/master/experiments/RegInf
include comparisons with other regulatory inference methods which are unnecessary for end users, and cannot be run as is. In this case the corr = biadjacency_matrix(...)
) computes correlation-based regulatory inference as one of these comparisons. Here we were using paired RNA and ATAC data so the utils.metacell_corr
function only works with paired multi-omics profiles. The data you are working on seems to be unpaired (cell number differs across RNA and ATAC). That's why it's throwing this error.
A dedicated regulatory inference tutorial with no unnecessary code is still in the works (see #15). It should be ready by the end of this month. Sorry for the inconvenience. I will let you know then.
Thank you Jeff for your reply. Please let me know when the dedicated tutorial is ready. I look forward to it. In the meantime I plan to create subset of bigger object to match the size of the smaller object for integration. Do you think this approach will work or should be theoretically done?
Of course! I'll let you know.
For the second question, the GLUE training process ensures that smaller objects is upsampled to match bigger objects, so I think there is no need to subset the bigger object.
Thank you Jeff for new tutorial.
You are welcome! The new tutorial requires an updated version (v0.3.0) of scglue. I've released it on PyPI, but still having some problems with the bioconda update. Hopefully I can get it working in a few days.
Let me know if you have further problems with the new tutorial though :)