capnproto-rust
capnproto-rust copied to clipboard
Best practices for reading from long-lived mmap'ed files?
Not sure if this is actually an issue, but I'm not sure what the best venue is for asking this question. I'd like to create a type that mmap's a file full of capnproto data and then exposes as interface to query it; I expect it to stick around for the life of the application. This kind of thing is discussed a lot in the upstream cpnproto world, but it's not clear what the best practices are for accomplishing it in Rust.
It looks like there are two ways to construct readers, one that owns all of its data but gets it by copying it out of a reference you pass in (read_message) -- I don't want that, since copying it all into memory defeats the purpose of mmap'ing it -- or one that takes a reference and doesn't take ownership of the underlying data. Conceptually if I wanted to use the latter, since somebody has to own the data, I'd end up with a shape like:
pub struct MyData<'a> {
data: memmap::Mmap,
reader: capnp::message::Reader<capnp::serialize::SliceSegments<'a>>,
}
... except that I can't actually do that easily, because self-referential structs aren't allowed in Rust, so my reader can't hold a reference to my data.
So far, the things I've been able to figure out are:
- don't hold a reader and just instantiate a new reader temporarily anytime I have to read anything -- this seems wasteful, since it looks like the reading of segment tables and such at reader instantiation time isn't exactly free
- use the struct shape proposed above, but
unsafe { std::mem::transmute() }the memory-mapped data to&'static-- I think this is safe, because the data will live as long as the structure does, and memory-mapped data won't move, but it feels icky and I'm not that confident in it - instead of the regular reader, use the async/futures reader, which (it looks like) does take ownership of its data without copying; in that event I guess I'd wrap the memory-mapped data in a
Cursor? But I don't actually need futures or any of that, so I'm not sure that makes sense either.
Is there another option I'm missing?
... except that I can't actually do that easily, because self-referential structs aren't allowed in Rust, so my reader can't hold a reference to my data.
Have you investigated Pin which is currently in beta?
@NickAtAccuPS I was definitely aware of Pin and its potential applicability to this problem, but it doesn't seem (yet) like there are great approaches for making self-referential structs using Pin, even though in theory I think the latter ought to facilitate the former; see, e.g., https://users.rust-lang.org/t/how-do-i-create-self-referential-structures-with-pin/24982/7 . Also: I think memmap would ideally be where memory-mapped data would be marked as Pin, rather than its consumers.
I'm also interested in this. I also had a similar issue dealing with a Vec<u8> and a Reader<SliceSegments> inside the same struct, with the Reader referencing the Vec<u8>. I solved this by using the OwningHandle struct from owning_ref crate.
@apendleton I don't know if you plan to have this MyData struct shared across threads, but if so, you'll run into an issue where the ReadLimiter in the Arena of the Reader contains a Cell that isn't Sync. I have a patched version using an atomic which I plan to upstream once Rust stabilizes the AtomicU64 (which should be very soon).
@appaquet I did run into that, yeah. Not because of threads yet, but because the test suite for the library I'm using makes pretty heavy use of lazy_static, which requires that data it exposes be Sync. My hack around that problem was uglier than yours though.
I've released 0.14, which adds support for atomic read limiting. https://dwrensha.github.io/capnproto-rust/2020/12/19/atomic-read-limiting.html
That should help, though there's probably a lot more we could do to make mmap more convenient to use with capnproto-rust.
#243 should make things easier.