muon
muon copied to clipboard
Segmentation fault in mu.neighbor
Describe the bug Hi, I'm using the muon to run the co-embedding of 280k multiome dataset and submit it to the lsf system with 40CPU+300GRAM resource. It errored with ''Segmentation fault" in the error.log and 'Exited with exit code 139' in the output.log. When I use a subset of 2000 cells, it works totally fine. Do you know how to fix it? Thank you very much for your help.
Here is my code
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad
import muon as mu
from muon import atac as ac
import mudata as md
from mudata import MuData
import os
import bbknn
mdata = mu.read("Cellarchr.h5mu") # 280k multiome dataset
mdata.update()
mu.pp.intersect_obs(mdata)
# Since subsetting was performed after calculating nearest neighbours,
# we have to calculate them again for each modality.
bbknn.bbknn(mdata['rna'], batch_key='brc_code', n_pcs=50, metric='euclidean', trim=200)
bbknn.bbknn(mdata['atac'], batch_key='brc_code', n_pcs=40, metric='euclidean', trim=200)
# Calculate weighted nearest neighbors
mu.pp.neighbors(mdata, key_added='wnn',n_multineighbors=50)
# report Segmentation fault
System
- OS: linux
- Python version 3.8.13
- Versions of libraries involved Package Version
anndata 0.8.0 bbknn 1.5.1 h5py 3.7.0 leidenalg 0.9.0 loompy 3.0.7 louvain 0.7.1 mudata 0.2.1 muon 0.1.2 networkx 2.8.6 notebook 6.4.12 numba 0.55.2 numpy 1.22.4 numpy-groupies 0.9.19 pandas 1.4.4 scanpy 1.9.1 scikit-learn 1.1.2 scikit-misc 0.1.4 scipy 1.9.1 seaborn 0.12.0 sklearn 0.0.post1 tornado 6.2 tqdm 4.64.1 umap-learn 0.5.3
Hey @Feilijiang, thanks for letting us know we should take a look at the performance on large datasets! It looks like a memory-related issue but there's hardly more that I can say from this information.
Is this a reproducible issue?
Do you know the memory consumption of the mu.pp.neighbors()
call?
I'm not sure whether it is reproducible. But I did try it several times with n_multineighbors ranging from 200 to 20 in the command line and it just ended without any notification. Then I submit the job and it error with segmentation fault. This is memory usage from the lsf job.
Thank you for the help!
Is there any log associated with the segmentation fault that you could share?