muon
muon copied to clipboard
atac.pp.scopen fails while allocating a second array after computing scopen but before writing result to disk
Is there a way around this? I think the original scopen project doesn't require this. (https://github.com/CostaLab/scopen/blob/master/vignettes/epiScanpy.ipynb)
02/03/2023 15:54:25, iteration: 484, violation: 0.00052132
02/03/2023 15:55:08, iteration: 485, violation: 0.00051939
02/03/2023 15:55:49, iteration: 486, violation: 0.00051745
02/03/2023 15:56:29, iteration: 487, violation: 0.00051554
02/03/2023 15:57:10, iteration: 488, violation: 0.00051364
02/03/2023 15:57:53, iteration: 489, violation: 0.00051179
02/03/2023 15:58:41, iteration: 490, violation: 0.00050995
02/03/2023 15:59:27, iteration: 491, violation: 0.00050813
02/03/2023 16:00:12, iteration: 492, violation: 0.00050633
02/03/2023 16:00:58, iteration: 493, violation: 0.00050454
02/03/2023 16:01:45, iteration: 494, violation: 0.00050276
02/03/2023 16:02:29, iteration: 495, violation: 0.00050102
02/03/2023 16:03:12, iteration: 496, violation: 0.00049927
02/03/2023 16:03:53, iteration: 497, violation: 0.00049755
02/03/2023 16:04:38, iteration: 498, violation: 0.00049584
02/03/2023 16:05:20, iteration: 499, violation: 0.00049414
[total time: 6h 6m 3s ]
Traceback (most recent call last):
File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/utils.py", line 214, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 175, in write_elem
_REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/registry.py", line 24, in wrapper
result = func(g, k, *args, **kwargs)
File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/anndata/_io/specs/methods.py", line 307, in write_basic
f.create_dataset(k, data=elem, **dataset_kwargs)
File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/group.py", line 161, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 48, in make_new_dset
data = base.array_for_new_object(data, specified_dtype=dtype)
File "/home/gridsan/lenail/.conda/envs/py39/lib/python3.9/site-packages/h5py/_hl/base.py", line 118, in array_for_new_object
data = np.asarray(data, order="C", dtype=as_dtype)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 240. GiB for an array with shape (65627, 491773) and data type float64
Hey @alexlenail,
Thanks for reporting, I think this is because in the current interface the matrix is imputed by default.
It also seems that scOpen's interfaces have been reworked since the interface in muon.atac
was written. So I'll try to make an upgrade to the interface in muon.atac
as well.
A thing to note here that scOpen itself has --no-impute=False
as a default argument and is generally proposed as an imputation method. Following this issue, I think I would be more inclined not to perform imputation by default and rather focus on the latent space but I'd be curious to also hear what you think about that.
To comment on the issue title, I don't think muon.atac.pp.scopen
writes anything on disc...
I ran scopen to impute my ATAC data using the scopen package directly, and it did not cause a memory error, so I think muon is maybe allocating more arrays than it needs to?
I believe imputation is performed by default via the main interface (see here) but scopen_dr()
, which was introduced later than the interface in muon
, does not perform imputation.
We'll upgrade the interface!