geo-deep-learning icon indicating copy to clipboard operation
geo-deep-learning copied to clipboard

num_workers > 0 not working at inference, on Windows OS

Open mpelchat04 opened this issue 2 years ago • 2 comments

We've encountered an error using the inference mode on the test dataset.
The error does not occur on Linux platform, only on Windows (both on CPU only and GPU devices).
The error can be avoided when num_workers is set to 0 (no multiprocessing).

Related references:

  • https://discuss.pytorch.org/t/errors-when-using-num-workers-0-in-dataloader/97564/3
  • https://discuss.pytorch.org/t/dataloader-multiprocessing-error-cant-pickle-odict-keys-objects-when-num-workers-0/43951

Here's the stack trace:

Instantiating inference generator for looping over imagery chips
  0%|                                                                                                                                                                                | 0/78 [00:00<?, ?it/s]
Error executing job with overrides: []
Traceback (most recent call last):
  File "c:\Users\maturgeo\Documents\Processus\pycharm\geo-deep-learning\GDL.py", line 71, in run_gdl
    task(cfg)
  File "c:\Users\maturgeo\Documents\Processus\pycharm\geo-deep-learning\inference_segmentation.py", line 354, in main
    for inference_prediction in eval_gen:
  File "c:\Users\maturgeo\Documents\Processus\pycharm\geo-deep-learning\inference_segmentation.py", line 176, in eval_batch_generator
    for batch in tqdm(dataloader, disable=not verbose):
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\site-packages\tqdm\std.py", line 1195, in __iter__
    for obj in iterable:
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\site-packages\torch\utils\data\dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\site-packages\torch\utils\data\dataloader.py", line 384, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\site-packages\torch\utils\data\dataloader.py", line 1048, in __init__
    w.start()
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\maturgeo\Miniconda3\envs\geo_deep_env\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'pad.<locals>._pad'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.  

mpelchat04 avatar Jul 21 '22 17:07 mpelchat04

I should have added that it currently only apply to this branch of the code: https://github.com/remtav/geo-deep-learning/tree/222-stac-item-input
@remtav

mpelchat04 avatar Jul 21 '22 17:07 mpelchat04

It is recommended you use Linux OS with GDL, am not sure if we can catch all platform dependent problems

valhassan avatar Aug 11 '22 16:08 valhassan