ddrm
ddrm copied to clipboard
imagenet_256_cc.yml runtime error
I'm trying to test the 256 ImageNet model on the deblurring task on the OOD data you provide in your adiacent repository. I'm getting this error:
ERROR - main.py - 2022-07-25 10:25:13,026 - Traceback (most recent call last):
File "/Users/mbejan/Documents/diffusion/ddrm/main.py", line 164, in main
runner.sample()
File "/Users/mbejan/Documents/diffusion/ddrm/runners/diffusion.py", line 161, in sample
self.sample_sequence(model, cls_fn)
File "/Users/mbejan/Documents/diffusion/ddrm/runners/diffusion.py", line 249, in sample_sequence
for x_orig, classes in pbar:
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 438, in __iter__
return self._get_iterator()
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 384, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1048, in __init__
w.start()
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/mbejan/opt/anaconda3/envs/ddrm/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Diffusion.sample_sequence.<locals>.seed_worker'
This is the script that creates the behaviour from above:
python main.py --ni \
--config imagenet_256_cc.yml \
--doc ood \
--timesteps 20 \
--eta 0.85 \
--etaB 1 \
--deg deblur_uni \
--sigma_0 0.05 \
My imagenet_256_cc.yml
is the same as the one your provide apart from the out_of _distribution
argument, which is set to true
.
#18 is related. I also had the same error. Adding global seed_worker
to Diffusion.sample_sequence
in diffusion.py fails to resolve issue:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'seed_worker' on <module 'runners.diffusion' from 'C:\\Users\\shaw\\Documents\\Year 2\\Diffusion Models\\ddrm\\runners\\diffusion.py'>
The reason (in my case) is that when running on Windows the multiprocessing
module uses spawn
and so one must (according to docs):
Wrap most of you main script’s code within if name == 'main': block, to make sure it doesn’t run again (most likely generating error) when each worker process is launched. You can place your dataset and DataLoader instance creation logic here, as it doesn’t need to be re-executed in workers.
Make sure that any custom collate_fn, worker_init_fn or dataset code is declared as top level definitions, outside of the main check. This ensures that they are available in worker processes. (this is needed since functions are pickled as references only, not bytecode.)
It is difficult to implement this advice since the seed_worker
function needs access to the input args
coming from the config file.
Simplest "solution" was to just set the worker_init_fn
argument to None
as below (within Diffusion.sample_sequence
):
val_loader = data.DataLoader(
test_dataset,
batch_size=config.sampling.batch_size,
shuffle=True,
num_workers=config.data.num_workers,
worker_init_fn=None,
generator=g,
)
#18 is related. I also had the same error. Adding
global seed_worker
toDiffusion.sample_sequence
in diffusion.py fails to resolve issue:Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\shaw\Anaconda3\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) AttributeError: Can't get attribute 'seed_worker' on <module 'runners.diffusion' from 'C:\\Users\\shaw\\Documents\\Year 2\\Diffusion Models\\ddrm\\runners\\diffusion.py'>
The reason (in my case) is that when running on Windows the
multiprocessing
module usesspawn
and so one must (according to docs):Wrap most of you main script’s code within if name == 'main': block, to make sure it doesn’t run again (most likely generating error) when each worker process is launched. You can place your dataset and DataLoader instance creation logic here, as it doesn’t need to be re-executed in workers. Make sure that any custom collate_fn, worker_init_fn or dataset code is declared as top level definitions, outside of the main check. This ensures that they are available in worker processes. (this is needed since functions are pickled as references only, not bytecode.)
It is difficult to implement this advice since the
seed_worker
function needs access to the inputargs
coming from the config file. Simplest "solution" was to just set theworker_init_fn
argument toNone
as below (withinDiffusion.sample_sequence
):val_loader = data.DataLoader( test_dataset, batch_size=config.sampling.batch_size, shuffle=True, num_workers=config.data.num_workers, worker_init_fn=None, generator=g, )
@lshaw8317 Hello, I have the same problem that occurs the error:
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Diffusion.sample_sequence.<locals>.seed_worker'
,and I tried to use your solution that I just set the worker_init_fn
argument to None
. However, after setting and implementing the code, the tqdm bar (indicating sampling progress) freezes at 0% for a while (about 20 seconds), and eventually it occurs a new error shown as below picture:
I don't know why it occurs "MemoryError". Did you encounter this new error? Did you know how to solve it? If you need more information about how I implemented the code, I am very willing to provide Thanks a lot!