style-based-gan-pytorch icon indicating copy to clipboard operation
style-based-gan-pytorch copied to clipboard

cannot run due to pickle error

Open dave7895 opened this issue 4 years ago • 14 comments

Not directly an issue with your project but rather asking for help. Every time I try to run train.py I get the error TypeError: cannot pickle 'Environment' object as a result when trying to run inside a conda environment.

dave7895 avatar Mar 05 '20 20:03 dave7895

Could you try this? https://github.com/pytorch/examples/issues/526

rosinality avatar Mar 05 '20 23:03 rosinality

so you mean directly at the beginning of the main function, not creating it and then inserting it?

dave7895 avatar Mar 10 '20 17:03 dave7895

Yes, right after imports.

rosinality avatar Mar 11 '20 02:03 rosinality

unfortunately the error persists.

File "<string>", line 1, in <module>
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

(style-based-gan) C:\Users\dawe_\style-based-gan-pytorch>python train.py database
Traceback (most recent call last):
  File "train.py", line 344, in <module>
    train(args, dataset, generator, discriminator)
  File "train.py", line 53, in train
    data_loader = iter(loader)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\site-packages\torch\utils\data\dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\site-packages\torch\utils\data\dataloader.py", line 719, in __init__
    w.start()
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Environment' object

(style-based-gan) C:\Users\dawe_\style-based-gan-pytorch>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

This is the full error.

dave7895 avatar Mar 16 '20 11:03 dave7895

I think it is hard to use multiprocessing and lmdb in windows. It will be simpler to use num_workers=0 in DataLoader.

rosinality avatar Mar 16 '20 12:03 rosinality

did it and now the error changed.

  0%|                                                                                      | 0/3000000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 344, in <module>
    train(args, dataset, generator, discriminator)
  File "train.py", line 168, in train
    fake_predict.backward()
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\site-packages\torch\tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\site-packages\torch\autograd\__init__.py", line 97, in backward
    Variable._execution_engine.run_backward(
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
  0%|                                                                                      | 0/3000000 [00:02<?, ?it/s]

so it started at least

dave7895 avatar Mar 19 '20 07:03 dave7895

Could you try CUDA_LAUNCH_BLOCKING=1?

rosinality avatar Mar 19 '20 08:03 rosinality

Where? Right after the imports or in the main function?

dave7895 avatar Mar 19 '20 18:03 dave7895

Set it as environment variables before running training scripts. It will allow to spot exactly where error occurred.

rosinality avatar Mar 20 '20 00:03 rosinality

(style-based-gan) C:\Users\dawe_\style-based-gan-pytorch>python train.py database
  0%|                                                                                      | 0/3000000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 344, in <module>
    train(args, dataset, generator, discriminator)
  File "train.py", line 168, in train
    fake_predict.backward()
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\site-packages\torch\tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\dawe_\Miniconda3\envs\style-based-gan\lib\site-packages\torch\autograd\__init__.py", line 97, in backward
    Variable._execution_engine.run_backward(
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
  0%|                                                                                      | 0/3000000 [00:01<?, ?it/s]

as far as I can tell the error stays the same

dave7895 avatar Mar 20 '20 01:03 dave7895

Sorry, I don't know exactly why the error occurs. Sometimes that error could be resolved by inserting .contiguous() , but It is hard to know where it should be.

rosinality avatar Mar 20 '20 01:03 rosinality

thank you for your patience. is there anything i could do to increase verbosity or get the exact point of error otherwise or should I just append .contiguous() randomly?

dave7895 avatar Mar 20 '20 10:03 dave7895

Also sometimes that error occurs when input sizes are too large. Maybe you can try to reduce batch sizes. But I don't know well about theses kind of errors.

rosinality avatar Mar 20 '20 11:03 rosinality

thank you for your patience. is there anything i could do to increase verbosity or get the exact point of error otherwise or should I just append .contiguous() randomly?

I met the same problem,have you solved this problem?

yangyvhang666 avatar Apr 14 '20 01:04 yangyvhang666