deepinterpolation OverflowError: cannot serialize a bytes object larger than 4 GiB (reduction.py)

When starting training on windows 10, tensorflow 2.4.4 OverflowError: cannot serialize a bytes object larger than 4 GiB error appears in multiprocessing/reduction.py (line 60)

not really knowing what I am doing, but the error goes away if I change line 58 to: def dump(obj, file, protocol=4):

Dec 07 '21 00:12 asumser

Man, I was stumbling upon the exact same error today. I am trying to deploy an processing container agent on AWS ECR.

Do you have a link that made you make this change? It could be related to a particular version of the multiprocessing package.

Dec 07 '21 00:12 jeromelecoq

I found it on some random forum. but here i think is the source:

https://docs.python.org/3/library/pickle.html

Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.

Dec 07 '21 01:12 asumser

Ok. I will try upgrading to python 3.8 and see if that fixes it.

Dec 07 '21 01:12 jeromelecoq

My workaround doesn't work and gives a invalid syntax error. should I try with python 3.8?

Dec 07 '21 20:12 asumser

Yes, I have found this to go away with a docker in python 3.8 on AWS

Dec 07 '21 20:12 jeromelecoq

I am not sure if this is related but I've been getting the following error:

Epoch 1/2 WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended. Exception in thread Thread-3: Traceback (most recent call last): File "c:\Users---.conda\envs\deepinterp\lib\threading.py", line 926, in _bootstrap_inner self.run() File "c:\Users---.conda\envs\deepinterp\lib\threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "c:\Users---.conda\envs\deepinterp\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 748, in _run with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor: File "c:\Users---.conda\envs\deepinterp\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 727, in pool_fn initargs=(seqs, None, get_worker_id_queue())) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\pool.py", line 176, in init self._repopulate_pool() File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\pool.py", line 241, in _repopulate_pool w.start() File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) MemoryError

I tried reducing the batch size and increasing the number of steps per epoch but to no avail. I also tried updating the conda environment to python 3.8, but that throws up the following error instead:

Traceback (most recent call last): File "Documents\DeepInterpTest\traineg1.py", line 3, in from deepinterpolation.generic import JsonSaver, ClassLoader File "C:\Users---.conda\envs\deepinterp\lib\site-packages\deepinterpolation\generic.py", line 98, in class ClassLoader: File "C:\Users---.conda\envs\deepinterp\lib\site-packages\deepinterpolation\generic.py", line 105, in ClassLoader from deepinterpolation import network_collection File "C:\Users---.conda\envs\deepinterp\lib\site-packages\deepinterpolation\network_collection.py", line 1, in from tensorflow.keras.layers import ( File "C:\Users---.conda\envs\deepinterp\lib\site-packages\tensorflow_init_.py", line 38, in import six as _six ModuleNotFoundError: No module named 'six'

Can anyone help me out with this?

Nov 17 '22 04:11 nataliekoh

How much RAM memory do you have on your machine?

Nov 17 '22 06:11 jeromelecoq

Hi Jerome, I have 46 GB of RAM.

Nov 17 '22 19:11 nataliekoh

Turn off multi_processing. When using multi-processing, it can duplicate your generator in each thread. This can results in larger memory use (depending on the generator you use). This is an option of the training object.

Nov 18 '22 04:11 jeromelecoq

That seems to have solved the problem. Thank you!

Nov 22 '22 04:11 nataliekoh