mnnpy
mnnpy copied to clipboard
no response at Performing cosine normalization...
Hi, thank you for this awesome implementation of MNN in python , it's a great work! when I use MNN to my data (~30000 genes * ~ 60000 cells) with all genes and specified HVG, it seems stuck for some reasons but without any hit. at the beginning period of run MNN, it would produce many processor which equal to my CPU processor numbers, but just 2 or 3 were live and the memory cost is up to 300GB. After that, all processor sleep without cpu using. overnight, it looks the same(>12h). Also, when I downsample to ~15000 cells, with all genes and HVG(~5000 genes) it's also huge memory cost and stuck at Performing cosine normalization.
1.Could you give some suggestions to solve those problems?
2.Could you provide script you mention in README Finishes correcting ~50000 cells/19 batches * ~30000 genes in ~12h on a 16 core 32GB mem server
. I want to make sure about my script is correct.
Thank you! The problem at cosine normalization has been reported repeatedly, possibly due to python's multiprocessing. I will release a cython optimized version hopefully this weekend to solve it.
Meanwhile, could you try mnnpy.settings.normalization = 'seq'
to change the normalization behaviour and see if the problem remains?
About 2: I used exact script in the README, only with more adata
s.
corrected = mnnpy.mnn_correct(sample1, sample2, sample3, var_subset=hvgs, batch_categories = ["1", "2", "3"])
adata = corrected[0]
Since the scaled genes other than hvgs are usually not necessary in the following steps, you could do
sample = sample[:, hvgs]
corrected = mnnpy.mnn_correct(sample1, sample2, sample3, batch_categories = ["1", "2", "2"])
adata = corrected[0]
to significantly reduce computation tasks.
I used mnnpy.settings.normalization = 'seq'
, but it looks same.
when I use hvgs(~2000 genes) mnnpy works well, but when I increasing up to ~5000 genes, mnnpy is still stuck in cosine normalization.
now, I prepare to using Intel Python Distribution you suggested to re-run my data.
Hi,
maybe I figure it out, because the large datasets, scanpy translates data to sparse matrix and the cosine normalization can't recognize sparse matrix and lead to huge memory cost and stuck at this step.
therefore, I just revisemnnpy/mnnpy/utils.py
line 33
datas = [data.astype(np.float32) for data in datas] to datas = [data.toarray().astype(np.float32) for data in datas]
it seems solved the problem, maybe you could test it.
刚刚看到你在北大,能加个微信吗?我在同济读博。
哈哈哈666,我微信17600716991