simpletransformers icon indicating copy to clipboard operation
simpletransformers copied to clipboard

use_multiprocessing = False doesn't seem to work?

Open pablogranolabar opened this issue 2 years ago • 5 comments

Hi!

I've implemented a T5.1-large fine tune training pipeline in Colab but I keep getting the following error message, even after use_multiprocessing = False:

Epoch 1 of 1: 0%
0/1 [00:00<?, ?it/s]
Epochs 0/1. Running Loss: 2.2631: 100%
75331/75331 [5:26:14<00:00, 3.90it/s]
0%
1/93864 [00:30<797:05:36, 30.57s/it]

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
    task = get()
  File "/usr/lib/python3.7/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
    storage = cls._new_shared_fd(fd, size)
RuntimeError: unable to mmap 360 bytes from file <filename not specified>: Cannot allocate memory (12)

I've tried numerous combinations of batch size and eval size and whatnot, to no avail; it seems like use_multiprocessing = False is being ignored by modelargs, or perhaps this is tokenizer multithreading that is at issue?

TIA

pablogranolabar avatar Mar 07 '22 22:03 pablogranolabar

Can you try using use_hf_datasets=True? This seems to be related to dataset batching.

ThilinaRajapakse avatar Mar 14 '22 11:03 ThilinaRajapakse

When I attempt to train mT5-base it runs out of memory when decoding evaluations at the end of an epoch, breaking the script. I believe it's to do with number of workers and multiprocessing / multithreading as now I get the error below when I explicitly set number of workers and other vars to 1

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
image

Before this multithreading error it would crash the process and state not enough memory, then it seems it would use multiple threads to reload the dataset and process it again. I have a 32gb machine with a 3090 so I believed this issues are not related to model size.

ArtanisTheOne avatar Apr 13 '22 14:04 ArtanisTheOne

Attempted setting process count to 1 as well; Same issue.

image

ArtanisTheOne avatar Apr 13 '22 14:04 ArtanisTheOne

os.environ["TOKENIZERS_PARALLELISM"] = "false"
model_args.dataloader_num_workers = 0
model_args.process_count = 1
model_args.use_multiprocessing_for_evaluation = False

this would do the trick

supernovalx avatar Apr 24 '22 08:04 supernovalx

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 21 '22 04:09 stale[bot]