vq-vae-2-pytorch icon indicating copy to clipboard operation
vq-vae-2-pytorch copied to clipboard

can't pickle Environment objects

Open MichalZajac opened this issue 4 years ago • 5 comments

Hi,

Sorry for bothering but I have following problem when trying to run train_pixelsnail.py on windows. I have succesfully trained the model and I have pretty good sample results from first step. The second step I am not sure and the 3. step fails.

(pytorch-vae) C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master>python train_pixelsnail.py sample2_lmdb
Namespace(batch=32, channel=256, ckpt=None, dropout=0.1, epoch=420, hier='top', lr=0.0003, n_cond_res_block=3, n_out_res_block=0, n_res_block=4, n_res_channel=256, path='sample2_lmdb', sched=None)
C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master\pytorch-vae\lib\site-packages\torch\nn\parallel\data_parallel.py:26: UserWarning:
    There is an imbalance between your GPUs. You may want to exclude GPU 1 which
    has less than 75% of the memory or cores of GPU 0. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.
  warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
  0%|                                                                                           | 0/31 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train_pixelsnail.py", line 139, in <module>
    train(args, i, loader, model, optimizer, scheduler, device)
  File "train_pixelsnail.py", line 19, in train
    for i, (top, bottom, label) in enumerate(loader):
  File "C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master\pytorch-vae\lib\site-packages\tqdm\_tqdm.py", line 1017, in __iter__
    for obj in iterable:
  File "C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master\pytorch-vae\lib\site-packages\torch\utils\data\dataloader.py", line 193, in __iter__
    return _DataLoaderIter(self)
  File "C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master\pytorch-vae\lib\site-packages\torch\utils\data\dataloader.py", line 469, in __init__
    w.start()
  File "c:\dev\python36\Lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "c:\dev\python36\Lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "c:\dev\python36\Lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "c:\dev\python36\Lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "c:\dev\python36\Lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle Environment objects


(pytorch-vae) C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\dev\python36\Lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "c:\dev\python36\Lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

The lmdb database has 100GB - is to OK?

Do you have any advice? Thank you in advance.

MichalZajac avatar Jul 08 '19 11:07 MichalZajac

I think there are some problems between multiprocessing and lmdb library...I think you can fix it temporarily by set num_workers=1 in DataLoader.

rosinality avatar Jul 15 '19 06:07 rosinality

num_workers=0 fixed it (1 didn't help)

eps696 avatar Jul 17 '19 01:07 eps696

Thank you guys for the anwers and your time. It works now with the num_workers=0. Great. Thx

MichalZajac avatar Jul 17 '19 07:07 MichalZajac

as for 100gb, no idea why the lmdb map size was fixed; i have used 4 * dataset size instead (for jpg may increase to 10)

import os, glob
map_size = 4 * sum(os.path.getsize(f) for f in glob.glob(args.dataset + '/**', recursive=True))

eps696 avatar Jul 17 '19 22:07 eps696

As in linux it is not very problematic to set it larger than actual data sizes...but maybe it is different in windows.

rosinality avatar Jul 18 '19 01:07 rosinality