mnnpy icon indicating copy to clipboard operation
mnnpy copied to clipboard

Overflow error

Open dawe opened this issue 5 years ago • 0 comments

I'm trying to apply mnn on my data, basically following the README of this project, but

corrected = mnnpy.mnn_correct(*tn5_data_list, batch_categories=batches)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/mnn.py", line 126, in mnn_correct
    svd_mode=svd_mode, do_concatenate=do_concatenate, **kwargs)
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/mnn.py", line 157, in mnn_correct
    var_subset, n_jobs)
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/utils.py", line 54, in transform_input_data
    in_scaling = p_n.map(l2_norm, in_batches)
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
    put(task)
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB

Indeed I'm dealing with a large dataset (3 AnnData with 10,000 cells and 1M features). A solution would be to start processing data in a standard way (without mnn) to reach the point in which I have identified features that can be retained (in the order of 20k), then restart mnn only on those features, would it work with similar results?

dawe avatar Mar 10 '20 10:03 dawe