data
data copied to clipboard
MPRS blocks indefinitely
🐛 Describe the bug
- Create a pipeline and use MPRS with 'spawn' (same behavior might potentially also be replicated with 'fork')
- Have the worker process fail to start due to an error during module import (other scenarios might potentially also provoke this)
- Main process now blocks endlessly in
[x for x in dataloader]
- It's only possible to terminate the process by sending an KeyboardInterupt.
Solution: Add a timeout parameter to the MPRS constructor and make
File "torchdata/dataloader2/communication/protocol.py", line 104, in get_new_request
response = self.request_queue.get(block=block)
fail after that timeout period.
Versions
https://github.com/pytorch/data/commit/e78ab6c9ec94f05f0a350ced7fe571f6863c20ec
Thanks for reporting this.
Do you have a simple script to re-create this behavior?
Unfortunately, no. The errors during import in the worker processes are due to my environment being screwed up and import order actually matters in in that regard.
I unfortunately currently have no free capacities to specifically create a minimal reproducible example for you. All I can do right now is to report my observations.
I suppose having a module that checks, during import, whether the current process is a child process and if so throws an error would be sufficient to provoke this.