compact icon indicating copy to clipboard operation
compact copied to clipboard

Support for mmap across machines?

Open complyue opened this issue 3 years ago • 5 comments

I have a motivating use case, favoring direct mmap over serialization/deserialization, where a Compact Region tends to be reused by many machines, from multiple parallel processes, and even multiple lifecycles of processes over time.

That's some single heap of cyclic data structures (can be immutable at light ergonomics cost), which I propose to be stored as a Compact Region, are actively scanned at heavy parallelism. While a consuming node machine will maintain some fixed number of concurrent processes performing arbitrary jobs, a process will exit after done some jobs, followed by another process created to carry on more jobs. Some jobs may share a same heap of data, so it's much desirable that such a heap be cached by os kernel pages automatically.

It's pretty straight forward by using a virtual file (e.g. driven by a FUSE filesystem) that mmap'ed with its content fetched on demand, or a physical file on a shared storage (e.g. mounted via NFS) mmap'ed will do similarly, which is easier to implement but less flexible.

I suppose a Compact Region can be read right away if I manage to have it mmap'ed to the same address in space, from another machine, but it's way over restrictive for flexibility, I wonder if pointers within a Compact Region have already be aware of relocation and would work as expected already, or how much work needed to achieve that?

And if code change needed, can it be done with a library separate from stock GHC?

complyue avatar Aug 12 '20 07:08 complyue

On second thought, I realize there also needs a Compact Region building api, that takes a designated mmap'ed region as target storage space, instead of malloc-on-demand or sth similar. Is this feasible as well?

complyue avatar Aug 12 '20 08:08 complyue

The pointers will never automatically relocate, that would require GHC to generate different code to process compact region pointers, and the point is that you don't have to recompile anything. You'll have to map the memory region into exactly the same address space everywhere.

ezyang avatar Aug 12 '20 16:08 ezyang

I get it, thanks. Then I'm not aware which api I can use to build a Compact Region at specified address (within a mmap'ed region), does such an api already exist?

complyue avatar Aug 13 '20 06:08 complyue

Oh, I forgot about some internal details of our implementation. Since compact regions have to live in honest to goodness GHC blocks in the memory manager, what you want may be somewhat difficult to actually do; at least, it's not supported out of the box here.

ezyang avatar Aug 14 '20 03:08 ezyang

I'm not very familiar with internals of GHC, does the memory manager have some sorta extension mechanism, viable for me to mmap a region and persuade the allocator to use it?

I'd think with heavier use of this, a parallel Haskell implementation may perform much better on workloads with large immutable datasets as shared input.

complyue avatar Aug 14 '20 08:08 complyue