simpletransformers
simpletransformers copied to clipboard
use_multiprocessing = False doesn't seem to work?
Hi!
I've implemented a T5.1-large fine tune training pipeline in Colab but I keep getting the following error message, even after use_multiprocessing = False:
Epoch 1 of 1: 0%
0/1 [00:00<?, ?it/s]
Epochs 0/1. Running Loss: 2.2631: 100%
75331/75331 [5:26:14<00:00, 3.90it/s]
0%
1/93864 [00:30<797:05:36, 30.57s/it]
Exception in thread Thread-16:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
task = get()
File "/usr/lib/python3.7/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
storage = cls._new_shared_fd(fd, size)
RuntimeError: unable to mmap 360 bytes from file <filename not specified>: Cannot allocate memory (12)
I've tried numerous combinations of batch size and eval size and whatnot, to no avail; it seems like use_multiprocessing = False is being ignored by modelargs, or perhaps this is tokenizer multithreading that is at issue?
TIA
Can you try using use_hf_datasets=True
? This seems to be related to dataset batching.
When I attempt to train mT5-base it runs out of memory when decoding evaluations at the end of an epoch, breaking the script. I believe it's to do with number of workers and multiprocessing / multithreading as now I get the error below when I explicitly set number of workers and other vars to 1
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

Before this multithreading error it would crash the process and state not enough memory, then it seems it would use multiple threads to reload the dataset and process it again. I have a 32gb machine with a 3090 so I believed this issues are not related to model size.
Attempted setting process count to 1 as well; Same issue.

os.environ["TOKENIZERS_PARALLELISM"] = "false"
model_args.dataloader_num_workers = 0
model_args.process_count = 1
model_args.use_multiprocessing_for_evaluation = False
this would do the trick
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.