muon icon indicating copy to clipboard operation
muon copied to clipboard

core dumped when run 'mu.pp.neighbors(mdata, key_added='wnn')'

Open 111kakaluote opened this issue 2 years ago • 4 comments

Describe the bug when I test the muon pipeline by using data of 6400cells, error happens when run 'mu.pp.neighbors(mdata, key_added='wnn')'' like : *** Error in `python': malloc(): smallbin double linked list corrupted: 0x0000558a3b39a900 ***

System

  • Python 3.8
  • Anndata 0.7.8
  • muon 0.1.2
  • scanpy 1.8.2
  • mudata 0.1.2

111kakaluote avatar Dec 15 '22 05:12 111kakaluote

Hi @111kakaluote, 6400 cells should not be an issue for mu.pp.neighbors though it depends on the available resources of course. For instance, you can find a tutorial with CITE-seq data of similar size here.

What is the size of the feature space that is being used? In standard workflows, reduced representation like PCA is used prior to calculating cell neighbourhood graphs, is it the case here as well?

gtca avatar Jan 03 '23 23:01 gtca

Hi @111kakaluote, 6400 cells should not be an issue for mu.pp.neighbors though it depends on the available resources of course. For instance, you can find a tutorial with CITE-seq data of similar size here.

What is the size of the feature space that is being used? In standard workflows, reduced representation like PCA is used prior to calculating cell neighbourhood graphs, is it the case here as well?

@gtca hi, there are 20015 gene and 18 protein feature, and I has reduced representation by PCA, my script is

##clr normalize
    pt.pp.clr(malldata['prot'])
    sc.pp.scale(malldata['prot'], max_value=10)
    sc.tl.pca(malldata['prot'])
##rna analysis
    malldata['rna'].layers['counts'] = malldata['rna'].X.copy()
##filter cell
    malldata['rna'].var['mt'] = malldata['rna'].var_names.str.contains("^[Mm][Tt]-")  # annotate the group of mitochondrial genes as 'mt'
    sc.pp.calculate_qc_metrics(malldata['rna'], qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
    mu.pp.filter_obs(malldata['rna'], 'pct_counts_mt', lambda x: x <= args.mtfilter)
##rna normalize
    sc.pp.normalize_total(malldata['rna'], target_sum=1e4)
    sc.pp.log1p(malldata['rna'])
    sc.pp.highly_variable_genes(malldata['rna'], min_mean=0.02, max_mean=4, min_disp=0.5)
    malldata['rna'].raw = malldata['rna']
    sc.pp.scale(malldata['rna'], max_value=10)
    sc.tl.pca(malldata['rna'], svd_solver='arpack')
##subset cells in the protein modality
    mu.pp.intersect_obs(malldata)
    sc.pp.neighbors(malldata['rna'])
    sc.pp.neighbors(malldata['prot'])

# Calculate weighted nearest neighbors
    mu.pp.neighbors(malldata, key_added='wnn',low_memory=True)
    mu.tl.umap(malldata, neighbors_key='wnn', random_state=10)

and now by using parameter low_memory=True, the memory used may be less.

gabumon0 avatar Feb 03 '23 05:02 gabumon0

Thank you, @gabumon0. Do you encounter the same issue at the line with mu.pp.neighbors()? Is there any log that you might be able to share?

gtca avatar Feb 21 '23 04:02 gtca

Thank you, @gabumon0. Do you encounter the same issue at the line with mu.pp.neighbors()? Is there any log that you might be able to share?

@gtca sorry, I am @111kakaluote too, @gabumon0 is my another ID and I forget to switch the github ID.

gabumon0 avatar Feb 21 '23 04:02 gabumon0