data icon indicating copy to clipboard operation
data copied to clipboard

Add memmap cache for Tensor

Open ejguan opened this issue 1 year ago • 2 comments

🚀 The feature

Beyond the on-disk cache and in-memory cache, it would be useful and performant if a memmap cache (under tensordict https://github.com/pytorch-labs/tensordict/blob/main/tensordict/memmap.py) It would boost better performance due to

  • Reduce the overhead of the inter-process communication
  • Reduce the time of decoding, etc. Users would directly read Tensor after the first epoch.

However, there are two major limitations:

  • Input has to be Tensor
  • The whole dataset has to fit into local fs

Motivation, pitch

Performance

Alternatives

No response

Additional context

No response

ejguan avatar Jan 25 '23 14:01 ejguan

It seems that memory mapping can mean many different things. Does the idea behind this issue correspond to https://www.mathworks.com/help/matlab/import_export/overview-of-memory-mapping.html ?

Hence, storing tensor data in a (potentially large file) to share it between processes and to improve reading time?

lennartclaas avatar Mar 11 '23 16:03 lennartclaas

Hence, storing tensor data in a (potentially large file) to share it between processes and to improve reading time?

Correct. This is inspired by tensordict to help accelerating MP.

ejguan avatar Mar 13 '23 13:03 ejguan