data
data copied to clipboard
Utilize `multiprocessing.shared_memory` in `DataLoader2` for Performance Improvements
🚀 The feature
Given that we will not support for Python 3.7 in future releases, we can utilize multiprocessing.shared_memory
that was introduced in Python 3.8.
It can potentially replaces some of the existing interprocess communication that we have, notably the usage of serialization/deserialization and queues. We will first have to evaluate the feasibility; it is not clear to me how flexible it is (i.e. if it can store arbitrary objects without serialization).
Some possible areas of improvements include:
- Worker prefetchers can write to shared memory that can be accessed by main process, rather than its own buffer, then serialize/deserialize, and put into a queue
- Single dispatch mechanism can write to shared memory that is readable by worker processes, skipping over most of the existing IPC
Motivation, pitch
If used correctly, it has the potential of greatly improving multiprocessing data loading performance.
Alternatives
Keep things as they are if we do not find improvements or if it is too complex/premature to use.
Additional context
No response