muon atac.pp.scopen fails while allocating a second array after computing scopen but before writing result to disk

Is there a way around this? I think the original scopen project doesn't require this. (https://github.com/CostaLab/scopen/blob/master/vignettes/epiScanpy.ipynb)

02/03/2023 15:54:25, iteration:  484, violation:  0.00052132
02/03/2023 15:55:08, iteration:  485, violation:  0.00051939
02/03/2023 15:55:49, iteration:  486, violation:  0.00051745
02/03/2023 15:56:29, iteration:  487, violation:  0.00051554
02/03/2023 15:57:10, iteration:  488, violation:  0.00051364
02/03/2023 15:57:53, iteration:  489, violation:  0.00051179
02/03/2023 15:58:41, iteration:  490, violation:  0.00050995
02/03/2023 15:59:27, iteration:  491, violation:  0.00050813
02/03/2023 16:00:12, iteration:  492, violation:  0.00050633
02/03/2023 16:00:58, iteration:  493, violation:  0.00050454
02/03/2023 16:01:45, iteration:  494, violation:  0.00050276
02/03/2023 16:02:29, iteration:  495, violation:  0.00050102
02/03/2023 16:03:12, iteration:  496, violation:  0.00049927
02/03/2023 16:03:53, iteration:  497, violation:  0.00049755
02/03/2023 16:04:38, iteration:  498, violation:  0.00049584
02/03/2023 16:05:20, iteration:  499, violation:  0.00049414
[total time:  6h 6m 3s ]
Traceback (most recent call last):
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/utils.py", line 214, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 175, in write_elem
    _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 24, in wrapper
    result = func(g, k, *args, **kwargs)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/methods.py", line 307, in write_basic
    f.create_dataset(k, data=elem, **dataset_kwargs)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/group.py", line 161, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 48, in make_new_dset
    data = base.array_for_new_object(data, specified_dtype=dtype)
  File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/base.py", line 118, in array_for_new_object
    data = np.asarray(data, order="C", dtype=as_dtype)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 240. GiB for an array with shape (65627, 491773) and data type float64

Feb 03 '23 23:02 alexlenail

Hey @alexlenail,

Thanks for reporting, I think this is because in the current interface the matrix is imputed by default.

It also seems that scOpen's interfaces have been reworked since the interface in muon.atac was written. So I'll try to make an upgrade to the interface in muon.atac as well.

A thing to note here that scOpen itself has --no-impute=False as a default argument and is generally proposed as an imputation method. Following this issue, I think I would be more inclined not to perform imputation by default and rather focus on the latent space but I'd be curious to also hear what you think about that.

Feb 06 '23 16:02 gtca

To comment on the issue title, I don't think muon.atac.pp.scopen writes anything on disc...

Feb 06 '23 16:02 gtca

I ran scopen to impute my ATAC data using the scopen package directly, and it did not cause a memory error, so I think muon is maybe allocating more arrays than it needs to?

Feb 07 '23 16:02 alexlenail

I believe imputation is performed by default via the main interface (see here) but scopen_dr(), which was introduced later than the interface in muon, does not perform imputation.

We'll upgrade the interface!

Feb 21 '23 04:02 gtca

muon muon copied to clipboard

atac.pp.scopen fails while allocating a second array after computing scopen but before writing result to disk

muon
muon copied to clipboard