lora icon indicating copy to clipboard operation
lora copied to clipboard

EOFError when start training, versioning problem?

Open zephiroth666 opened this issue 2 years ago • 2 comments

keep getting this error when start training:

Steps: 0%| | 0/5000 [00:00<?, ?it/s]Traceback (most recent call last): File "\Github\lora\train_lora_dreambooth.py", line 998, in main(args) File "\Github\lora\train_lora_dreambooth.py", line 809, in main for step, batch in enumerate(train_dataloader): File "\Python\Python310\lib\site-packages\accelerate\data_loader.py", line 345, in iter dataloader_iter = super().iter() File "\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 444, in iter return self._get_iterator() File "\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "\Python\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init w.start() File "\Python\Python310\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen return Popen(process_obj) File "\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in init reduction.dump(process_obj, to_child) File "\Python\Python310\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'DreamBoothDataset.init..' Traceback (most recent call last): File "", line 1, in File "\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "****\Python\Python310\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

Seems to be a versioning problem of /lib/multiprocessing?

zephiroth666 avatar Jan 03 '23 15:01 zephiroth666

Can you show us the version of the packages you are using? diffusers, pytorch, python, etc.

cloneofsimo avatar Jan 08 '23 18:01 cloneofsimo

Had the same issue. This happens on Windows. Solved by setting num_workers to 0 rather than 1 in train_lora_dreambooth.py

 train_dataloader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=args.train_batch_size,
        shuffle=True,
        collate_fn=collate_fn,
        num_workers=0,
    )

Norod avatar Jan 25 '23 17:01 Norod