data
data copied to clipboard
Add memmap cache for Tensor
🚀 The feature
Beyond the on-disk cache and in-memory cache, it would be useful and performant if a memmap cache (under tensordict https://github.com/pytorch-labs/tensordict/blob/main/tensordict/memmap.py) It would boost better performance due to
- Reduce the overhead of the inter-process communication
- Reduce the time of decoding, etc. Users would directly read Tensor after the first epoch.
However, there are two major limitations:
- Input has to be Tensor
- The whole dataset has to fit into local fs
Motivation, pitch
Performance
Alternatives
No response
Additional context
No response
It seems that memory mapping can mean many different things. Does the idea behind this issue correspond to https://www.mathworks.com/help/matlab/import_export/overview-of-memory-mapping.html ?
Hence, storing tensor data in a (potentially large file) to share it between processes and to improve reading time?
Hence, storing tensor data in a (potentially large file) to share it between processes and to improve reading time?
Correct. This is inspired by tensordict
to help accelerating MP.