mnnpy no response at Performing cosine normalization...

Hi, thank you for this awesome implementation of MNN in python , it's a great work! when I use MNN to my data (~30000 genes * ~ 60000 cells) with all genes and specified HVG, it seems stuck for some reasons but without any hit. at the beginning period of run MNN, it would produce many processor which equal to my CPU processor numbers, but just 2 or 3 were live and the memory cost is up to 300GB. After that, all processor sleep without cpu using. overnight, it looks the same(>12h). Also, when I downsample to ~15000 cells, with all genes and HVG(~5000 genes) it's also huge memory cost and stuck at Performing cosine normalization.

1.Could you give some suggestions to solve those problems? 2.Could you provide script you mention in README Finishes correcting ~50000 cells/19 batches * ~30000 genes in ~12h on a 16 core 32GB mem server. I want to make sure about my script is correct.

May 30 '18 06:05 zacharylau10

Thank you! The problem at cosine normalization has been reported repeatedly, possibly due to python's multiprocessing. I will release a cython optimized version hopefully this weekend to solve it. Meanwhile, could you try mnnpy.settings.normalization = 'seq' to change the normalization behaviour and see if the problem remains? About 2: I used exact script in the README, only with more adatas.

corrected = mnnpy.mnn_correct(sample1, sample2, sample3, var_subset=hvgs, batch_categories = ["1", "2", "3"])
adata = corrected[0]

Since the scaled genes other than hvgs are usually not necessary in the following steps, you could do

sample = sample[:, hvgs]
corrected = mnnpy.mnn_correct(sample1, sample2, sample3, batch_categories = ["1", "2", "2"])
adata = corrected[0]

to significantly reduce computation tasks.

May 31 '18 05:05 chriscainx

I used mnnpy.settings.normalization = 'seq', but it looks same. when I use hvgs(~2000 genes) mnnpy works well, but when I increasing up to ~5000 genes, mnnpy is still stuck in cosine normalization.

now, I prepare to using Intel Python Distribution you suggested to re-run my data.

May 31 '18 12:05 zacharylau10

Hi, maybe I figure it out, because the large datasets, scanpy translates data to sparse matrix and the cosine normalization can't recognize sparse matrix and lead to huge memory cost and stuck at this step. therefore, I just revisemnnpy/mnnpy/utils.py line 33

datas = [data.astype(np.float32) for data in datas] to datas = [data.toarray().astype(np.float32) for data in datas]

it seems solved the problem, maybe you could test it.

Jun 01 '18 08:06 zacharylau10

刚刚看到你在北大，能加个微信吗？我在同济读博。

Jun 01 '18 09:06 zacharylau10

哈哈哈666，我微信17600716991

Jun 02 '18 05:06 chriscainx

mnnpy mnnpy copied to clipboard

no response at Performing cosine normalization...

mnnpy
mnnpy copied to clipboard