pytorch-meta-dataset icon indicating copy to clipboard operation
pytorch-meta-dataset copied to clipboard

Exception when num_workers > 0 on Windows, works on linux

Open jfb54 opened this issue 3 years ago • 4 comments

On Windows 10, if num_workers > 0, you get the following exception: Traceback (most recent call last): File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child) File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle generator objects python-BaseException Traceback (most recent call last): File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input python-BaseException

jfb54 avatar Mar 13 '21 22:03 jfb54

I believe this is due to the way my datasets are instantiated. For instance, when instantiating an EpisodicDataset, it creates a list of generators at https://github.com/mboudiaf/pytorch-meta-dataset/blob/5c4e85b149cf7079789190a6326c73bcc7efd1f6/pytorch_meta_dataset/pipeline.py#L100 . The problem is that generator objects cannot be pickled, which is exactly what he seems to be doing on Windows when multiprocessing is activated (i.e num_workers > 0). I suspect the way it works is that the dataset is created on the main worker, and then pickled for other processes to load.

So the workaround would be to remove this line and find a way to create the generator in the iter function (only when needed of course) and not the init . This should be doable with a try except. Given that I do not have Windows 10, I will unfortunately be unable to reproduce this error, but I would be happy to help debug it further :)

mboudiaf avatar Mar 15 '21 01:03 mboudiaf

Thanks for clarifying. I have a linux machine as well, so I am not blocked. I may try your suggestion.

jfb54 avatar Mar 15 '21 14:03 jfb54

I have tried to fix the issue by implementing the initial workaround I proposed earlier. Please let me know if that fixes the issue on Windows ! Thanks in advance :)

mboudiaf avatar Apr 01 '21 18:04 mboudiaf

Thanks for working on this. Unfortunately, there is still an issue on Windows, with num_workers > 0, there is a new error: ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'Reader.construct_class_datasets..decode_image'

jfb54 avatar Apr 02 '21 09:04 jfb54