mnnpy
mnnpy copied to clipboard
Overflow error
I'm trying to apply mnn on my data, basically following the README of this project, but
corrected = mnnpy.mnn_correct(*tn5_data_list, batch_categories=batches)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/mnn.py", line 126, in mnn_correct
svd_mode=svd_mode, do_concatenate=do_concatenate, **kwargs)
File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/mnn.py", line 157, in mnn_correct
var_subset, n_jobs)
File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/utils.py", line 54, in transform_input_data
in_scaling = p_n.map(l2_norm, in_batches)
File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
put(task)
File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB
Indeed I'm dealing with a large dataset (3 AnnData with 10,000 cells and 1M features).
A solution would be to start processing data in a standard way (without mnn) to reach the point in which I have identified features that can be retained (in the order of 20k), then restart mnn only on those features, would it work with similar results?