vq-vae-2-pytorch
vq-vae-2-pytorch copied to clipboard
can't pickle Environment objects
Hi,
Sorry for bothering but I have following problem when trying to run train_pixelsnail.py on windows. I have succesfully trained the model and I have pretty good sample results from first step. The second step I am not sure and the 3. step fails.
(pytorch-vae) C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master>python train_pixelsnail.py sample2_lmdb
Namespace(batch=32, channel=256, ckpt=None, dropout=0.1, epoch=420, hier='top', lr=0.0003, n_cond_res_block=3, n_out_res_block=0, n_res_block=4, n_res_channel=256, path='sample2_lmdb', sched=None)
C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master\pytorch-vae\lib\site-packages\torch\nn\parallel\data_parallel.py:26: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 1 which
has less than 75% of the memory or cores of GPU 0. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
0%| | 0/31 [00:00<?, ?it/s]Traceback (most recent call last):
File "train_pixelsnail.py", line 139, in <module>
train(args, i, loader, model, optimizer, scheduler, device)
File "train_pixelsnail.py", line 19, in train
for i, (top, bottom, label) in enumerate(loader):
File "C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master\pytorch-vae\lib\site-packages\tqdm\_tqdm.py", line 1017, in __iter__
for obj in iterable:
File "C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master\pytorch-vae\lib\site-packages\torch\utils\data\dataloader.py", line 193, in __iter__
return _DataLoaderIter(self)
File "C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master\pytorch-vae\lib\site-packages\torch\utils\data\dataloader.py", line 469, in __init__
w.start()
File "c:\dev\python36\Lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "c:\dev\python36\Lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "c:\dev\python36\Lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "c:\dev\python36\Lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "c:\dev\python36\Lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle Environment objects
(pytorch-vae) C:\Users\admin\venv\PyTorch_tests\vq-vae-2-pytorch-master>Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:\dev\python36\Lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "c:\dev\python36\Lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
The lmdb database has 100GB - is to OK?
Do you have any advice? Thank you in advance.
I think there are some problems between multiprocessing and lmdb library...I think you can fix it temporarily by set num_workers=1 in DataLoader.
num_workers=0 fixed it (1 didn't help)
Thank you guys for the anwers and your time. It works now with the num_workers=0. Great. Thx
as for 100gb, no idea why the lmdb map size was fixed; i have used 4 * dataset size instead (for jpg may increase to 10)
import os, glob
map_size = 4 * sum(os.path.getsize(f) for f in glob.glob(args.dataset + '/**', recursive=True))
As in linux it is not very problematic to set it larger than actual data sizes...but maybe it is different in windows.