data icon indicating copy to clipboard operation
data copied to clipboard

Utilize `multiprocessing.shared_memory` in `DataLoader2` for Performance Improvements

Open NivekT opened this issue 2 years ago • 5 comments

🚀 The feature

Given that we will not support for Python 3.7 in future releases, we can utilize multiprocessing.shared_memory that was introduced in Python 3.8.

It can potentially replaces some of the existing interprocess communication that we have, notably the usage of serialization/deserialization and queues. We will first have to evaluate the feasibility; it is not clear to me how flexible it is (i.e. if it can store arbitrary objects without serialization).

Some possible areas of improvements include:

  1. Worker prefetchers can write to shared memory that can be accessed by main process, rather than its own buffer, then serialize/deserialize, and put into a queue
  2. Single dispatch mechanism can write to shared memory that is readable by worker processes, skipping over most of the existing IPC

Motivation, pitch

If used correctly, it has the potential of greatly improving multiprocessing data loading performance.

Alternatives

Keep things as they are if we do not find improvements or if it is too complex/premature to use.

Additional context

No response

NivekT avatar Feb 17 '23 20:02 NivekT