OverflowError: cannot serialize a bytes object larger than 4 GiB (reduction.py)
When starting training on windows 10, tensorflow 2.4.4 OverflowError: cannot serialize a bytes object larger than 4 GiB error appears in multiprocessing/reduction.py (line 60)
not really knowing what I am doing, but the error goes away if I change line 58 to: def dump(obj, file, protocol=4):
Man, I was stumbling upon the exact same error today. I am trying to deploy an processing container agent on AWS ECR.
Do you have a link that made you make this change? It could be related to a particular version of the multiprocessing package.
I found it on some random forum. but here i think is the source:
https://docs.python.org/3/library/pickle.html
Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.
Ok. I will try upgrading to python 3.8 and see if that fixes it.
My workaround doesn't work and gives a invalid syntax error. should I try with python 3.8?
Yes, I have found this to go away with a docker in python 3.8 on AWS
I am not sure if this is related but I've been getting the following error:
Epoch 1/2 WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended. Exception in thread Thread-3: Traceback (most recent call last): File "c:\Users---.conda\envs\deepinterp\lib\threading.py", line 926, in _bootstrap_inner self.run() File "c:\Users---.conda\envs\deepinterp\lib\threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "c:\Users---.conda\envs\deepinterp\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 748, in _run with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor: File "c:\Users---.conda\envs\deepinterp\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 727, in pool_fn initargs=(seqs, None, get_worker_id_queue())) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\pool.py", line 176, in init self._repopulate_pool() File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\pool.py", line 241, in _repopulate_pool w.start() File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "c:\Users---.conda\envs\deepinterp\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) MemoryError
I tried reducing the batch size and increasing the number of steps per epoch but to no avail. I also tried updating the conda environment to python 3.8, but that throws up the following error instead:
Traceback (most recent call last):
File "Documents\DeepInterpTest\traineg1.py", line 3, in
Can anyone help me out with this?
How much RAM memory do you have on your machine?
Hi Jerome, I have 46 GB of RAM.
Turn off multi_processing. When using multi-processing, it can duplicate your generator in each thread. This can results in larger memory use (depending on the generator you use). This is an option of the training object.
That seems to have solved the problem. Thank you!