miri icon indicating copy to clipboard operation
miri copied to clipboard

Support more ways of using mmap

Open RalfJung opened this issue 1 year ago • 12 comments

In a lengthy Zulip discussion, it was discovered that there are modes of use of mmap that are perfectly fine within the scope of Rust's memory model, but not supported by Miri's current implementation:

  • do an initial big reservation with MAP_NORESERVE, letting the kernel pick a suitable memory range and reserve that address space. (Yes, MAP_NORESERVE still reserves the address space. Talk about confusing flag names...) This may overcommit if permitted by the kernel (that's what "noreserve" refers to), but the memory is now all read/write accessible.
  • then later do smaller mmap in that range that actually "reserves" the memory (no more overcommit). This may set some flags to get huge tables if possible. It will (may?) also erase the previous contents of the re-mapped ranges, but doesn't change the range of memory that is read/write accessible, so it's fine with our current memory model.

See here for some example code. Thanks to Nils for helping with the exploration here!

RalfJung avatar May 14 '24 08:05 RalfJung

I haven't read the discussion fully, but here unsupported cases which were encountered by me:

  • Read-only mappings created using PROT_READ (i.e. without PROT_WRITE),
  • Mappings populated with MAP_POPULATE,
  • Reserving address space with PROT_NONE,
  • Mappings with MAP_FIXED (to populate address range created with PROT_NONE).

newpavlov avatar Jul 03 '24 12:07 newpavlov

I think at one point I started working on all of these then backed out. So here are some notes.

  • Mappings populated with MAP_POPULATE,

Is this useful without supporting file mappings? Adding file mappings would be a whole thing in itself; I don't think we can support MAP_SHARED so it would be an odd sort of thing to tell users about because most users in the ecosystem MAP_SHARED their files even if they only want to read from them.

  • Reserving address space with PROT_NONE,

I don't know what semantics people expect when a program tries to access PROT_NONE memory. Making the interpreter halt is probably the only viable option, because otherwise we'd have to... execute the segfault handler? I don't think it would be right to report UB here. Most likely PROT_NONE would have to be implemented in before_memory_read?

  • Mappings with MAP_FIXED (to populate address range created with PROT_NONE).

I tripped over my own feed trying to wire this up before, because of the many ways that MAP_FIXED can be used. But perhaps with the constraint that every mmap call returns a separate allocation it's simpler now.

Can you link your codebase that uses these APIs? That would be quite educational.

saethlin avatar Jul 03 '24 23:07 saethlin

Is this useful without supporting file mappings?

In our case we use it to reserve physical memory. Our program allocates one big memory chunk at startup and then works mostly with it. As I understand it, relying on MAP_NORESERVE for this is a misuse of the flag, since it's primarily about swap space and in our case by default we disable swap completely for the mapping using mlock.

I don't know what semantics people expect when a program tries to access PROT_NONE memory.

We use it to reserve one big continuous chunk of virtual memory which then gets mapped using MAP_FIXED. Depending on app configuration, one part of the chunk can use huge pages. We also use it for pseudo-vector data structs, which allocate a requested capacity with PROT_NONE and we map it gradually page-by-page using mremap using pages from the common pool.

Can you link your codebase that uses these APIs?

Unfortunately, it's a proprietary product and we do not have plans to open source it in the near future.

newpavlov avatar Jul 04 '24 00:07 newpavlov

and we map it gradually page-by-page using mremap

Do you rely on multiple mremap calls extending a single allocation? Or is the address range made available by each mremap call treated as a separate allocation?

I'm asking because the model that Miri implements right now is that mmap and mremap behave like malloc and realloc in the sense that no matter what the address values actually are, you cannot use ptr::offset to walk from one call to realloc to the allocation produced by another realloc call. If you need to be able to do that, we might have a deeper problem.

saethlin avatar Jul 04 '24 03:07 saethlin

Do you rely on multiple mremap calls extending a single allocation?

This one. We effectively implement a custom realloc which guarantees address stability of the allocation.

If you need to be able to do that, we might have a deeper problem.

Right now we rely on cfg(miri) to map the full capacity at once to work around this restriction. This means that MIRI tests run slightly different code, but it's better than nothing.

newpavlov avatar Jul 04 '24 11:07 newpavlov

This one. We effectively implement a custom realloc which guarantees address stability of the allocation.

There's been a bunch of discussion around that, but the gist is that currently this isn't something LLVM supports. See e.g. this thread. I brought this up with LLVM and I think it's a docs-only change to add support for at least a basic version of this -- but before we allow anything like that in Miri or otherwise consider this a blessed pattern in Rust, we need to get LLVM fixed.

(Also note that realloc, even if the address stays the same, generates a new provenance. Accesses through the old pointer are always UB. So it's not just about address stability, it's about keeping the provenance alive.)

This issue is mostly about supporting more things to be done with mmap without having to change the Rust memory model.

RalfJung avatar Jul 04 '24 11:07 RalfJung

This issue is mostly about supporting more things to be done with mmap without having to change the Rust memory model.

Yes, I understand. This is why I haven't mentioned use of mremap in my initial comment.

newpavlov avatar Jul 04 '24 11:07 newpavlov

(I think @nhusung is the right github handle? If not, I am sorry.^^ If yes, would be nice to record this in your rust-lang Zulip profile so people have a chance of pinging you here as well. :D )

Reserving address space with PROT_NONE,

Re-reading this, I don't think this can work. I don't think this is what Nils had in mind. PROT_NONE, IIUC, means that accessing that memory traps? This means it can't be inside the logical allocation. The MAP_NORESERVE step was meant to create the allocation, so it must make that memory available to the program (for reads and writes, that must behave like normal accesses).

RalfJung avatar Nov 10 '24 17:11 RalfJung

PROT_NONE, IIUC, means that accessing that memory traps?

Yes, but how is it different from traps which may happen memory-mapped files? Maybe PROT_NONE allocations can be handled as logical allocations and read/write/execution protections as a platform-dependent detail which may not be emulated perfectly by Miri?

newpavlov avatar Nov 11 '24 15:11 newpavlov

Traps when accessing mapped files are also generally UB. See prior discussions elsewhere for the difficulties of memory mapped files in Rust (and other similar languages like C), e.g. on IRLO.

Memory that traps cannot be considered inside a Rust allocation.

But please keep this issue focused on Miri, those questions of what Rust code can legally do are really better suited for the opsem Zulip or the UCG issue tracker.

RalfJung avatar Nov 11 '24 17:11 RalfJung

I think @nhusung is the right github handle? If not, I am sorry.^^ If yes, would be nice to record this in your rust-lang Zulip profile so people have a chance of pinging you here as well. :D

Yes, that’s the right handle, and I just updated my Zulip profile. Thanks for the note :D

Reserving address space with PROT_NONE,

Re-reading this, I don't think this can work.

Yes, this was one of the conclusions from the Zulip discussion linked in the initial issue comment. With PROT_READ | PROT_WRITE instead of PROT_NONE we would get a proper allocation (in Rust’s and LLVM’s terms), so we agreed that this is what is currently supported by Rust’s memory model. This issue is about supporting this in MIRI as well. For mapping with PROT_NONE and subsequent re-mapping, we would need deeper changes to the memory model of Rust (and also LLVM), if I understood it correctly. So that is beyond the issue here.

nhusung avatar Nov 12 '24 08:11 nhusung

FWIW LLVM now officially documents that allocations can grow, so we could document the same for Rust and also support that in Miri by mmap'ing more pages just at the end of an existing mmap allocation.

Partial munmap, or growing to the left, are still not supported and need further discussion since there are optimizations LLVM could be doing that would be broken by those operations.

RalfJung avatar Aug 30 '25 12:08 RalfJung