rust-sel4 Reads and writes to the IPC buffer are not marked as volatile

I have a system that I've been developing and mostly running in debug mode. When I tried running it in release mode, I was running into issues where the contents of the IPC buffer after a PPC did not match what the server put inside the IPC buffer.

I think this is because reads and writes to the IPC buffer are not marked volatile, as changing my code from

ipc_buf.msg_regs()[0]

to

core::ptr::read_volatile(ipc_buf.msg_regs().as_ptr())

seems to resolve the issue. I'm wondering if it should be the responsibility of the user to do this every time, or whether rust-sel4 should provide an abstraction that does this by default, as this seems like the behaviour that you would always want.

May 21 '25 03:05 alwin-joshy

Interesting, thanks for raising this issue.

According to my understanding, the inline assembly (asm!("svc 0", ...)) at the core of each syscall should serve as a compiler barrier, obviating the need for *_volatile accesses of the IPC buffer.

Can you provide more context (or, if possible without too much trouble, a reproducible example) for the case that isn't working?

May 21 '25 09:05 nspin

A bit hard to create a minimal example in the system I have now, I might try something with microkit later today to see if it's the same, but the code basically looks like this

let (handle, endpoint) = sel4::with_ipc_buffer_mut(|ipc_buf| {
            ipc_buf.set_recv_slot(slot);
            // ipc_buf.msg_regs_mut()[0] = ntfn_buffer as u64;
            unsafe {
                core::ptr::write_volatile(ipc_buf.msg_regs_mut().as_mut_ptr(), ntfn_buffer as u64)
            };
            let mut msginfo = sel4::MessageInfoBuilder::default()
                .label(SMOSInvocation::ConnPublish as u64)
                .length(1)
                .build();
            msginfo = self.ep().call(msginfo);
            assert!(msginfo.length() == 1 && msginfo.extra_caps() == 1);
            return Ok((
                // ipc_buf.msg_regs()[0]
                unsafe { core::ptr::read_volatile(ipc_buf.msg_regs().as_ptr()) },
                sel4::CPtr::from_bits(slot.path().bits()).cast::<sel4::cap_type::Endpoint>(),
            ));
 }

The value 'read' from the IPC buffer in the case of the non-volatile read is the same as the value that was put into the IPC buffer on the 3rd line.

May 21 '25 22:05 alwin-joshy

I was able to make a minimal microkit example that has the same problem here. It's pretty hacked together though and I didn't use your dockerfile/nix setup.

One of the components being in C and the other being in Rust is unrelated to the original situation, which had the caller and callee both in Rust, this is just the simplest system I had on hand to modify.

May 22 '25 01:05 alwin-joshy

I am not familiar with either Rust or rust-sel4, but this is a simple case that illustrate the issue

pub fn f(x: &mut i32) -> i32 {
    *x = 42;
    unsafe {asm!("svc 0")};
    *x
}

pub fn g(x: &mut i32) -> i32 {
    unsafe {core::ptr::write_volatile(x, 42)};
    unsafe {asm!("svc 0")};
    unsafe {core::ptr::read_volatile(x)}
}

A compiler barrier is not sufficient to prevent the compiler from fusing memory accesses.

May 22 '25 02:05 KurtWu10

why is there a &mut IpcBuffer in the first place? having means you assert no aliasing, but it seems like you are sharing the ipc buffer.

also, i think part of the motivation for reasoning about aliasing is to permit transformations in the presence of unknown foreign code, so i don't think you can really "compiler barrier" such reasoning.

May 22 '25 02:05 sledgehammervampire

Ahhh, right. That makes sense. Maybe I need to do two seperate with_ipc_buffer blocks and do the IPC invocation in the middle? I'll try that out.

Edit: Breaking it up into two separate with_ipc_buffer blocks seems to work.

Is there anything rust-sel4 can do to prevent this? If not, maybe a note should be added to the with_ipc_buffer functions saying that you should never do an IPC operation inside a with_ipc_buffer block and/or make these unsafe functions?

May 22 '25 02:05 alwin-joshy

it seems like there is unsoundness in some safe interface, since we can alias the ipc_buffer argument of with_ipc_buffer by calling the endpoint, which is declared safe.

May 22 '25 07:05 sledgehammervampire

A compiler barrier is not sufficient to prevent the compiler from fusing memory accesses.

Ah, you're right, the mutable reference to the memory being accessed by the assembly fragment is the issue. I wish there were a way to mark a mutable reference as being accessed by an asm!() fragment to a similar effect as that of passing a mutable reference to an FFI function.

May 22 '25 08:05 nspin

Is there anything rust-sel4 can do to prevent this? If not, maybe a note should be added to the with_ipc_buffer functions saying that you should never do an IPC operation inside a with_ipc_buffer block and/or make these unsafe functions?

why is there a &mut IpcBuffer in the first place? having means you assert no aliasing, but it seems like you are sharing the ipc buffer.

it seems like there is unsoundness in some safe interface, since we can alias the ipc_buffer argument of with_ipc_buffer by calling the endpoint, which is declared safe.

The sel4 crate handles the fact that the IPC buffer is shared with the kernel by wrapping accesses by both the PD and the kernel with a dynamic borrow check similar to that of a RefCell. Accesses by the PD are wrapped with e.g. with_ipc_buffer, and accesses by the kernel are wrapped by the IPC operation implementations. So, by this design, invoking an IPC operation within a with_ipc_buffer block or nesting a with_ipc_buffer block within another should panic in the same way that calling RefCell::borrow_mut twice panics.

These dynamic borrow checks had a bug that I just now fixed in https://github.com/seL4/rust-sel4/pull/257. After this PR, @alwin-joshy's PD will panic with a borrow error rather than exhibit unsound behavior.

May 22 '25 09:05 nspin

Yep, just tested the minimal example with your change and it crashes like you said, though the error message is not super clear.

pong: panicked at /Users/alwinjoshy/.cargo/git/checkouts/rust-sel4-d7edb8214cac2bde/4e5c162/crates/sel4/src/state/mod.rs:147:46:
called `Result::unwrap()` on an `Err` value: BorrowMutError(())

Maybe the unwrap should be changed to an expect() with a more helpful error message.

May 23 '25 05:05 alwin-joshy

Good call, I've improved the relevant error messages here:

https://github.com/seL4/rust-sel4/pull/259

May 23 '25 05:05 nspin

Awesome, that looks good.

So to conclude this issue, the fix is basically to always put marshalling and unmarshalling code in seperate with_ipc_buffer blocks seperated by the IPC invocation. This is guaranteed to prevent the memory access fusing @KurtWu10 demonstrated because the compiler memory barrier resulting from an asm (or something else) will keep these two blocks completely independent?

May 23 '25 05:05 alwin-joshy

I am not sure about this. I cannot find an explicit reference in rust's manual on this. I assume that even without the asm, the code will work because of transfer of ownership. (the first block has the exclusive ownership, then the kernel, and finally the second block)

May 23 '25 05:05 KurtWu10

Awesome, that looks good.

So to conclude this issue, the fix is basically to always put marshalling and unmarshalling code in seperate with_ipc_buffer blocks seperated by the IPC invocation. This is guaranteed to prevent the memory access fusing @KurtWu10 demonstrated because the compiler memory barrier resulting from an asm (or something else) will keep these two blocks completely independent?

i don't think it's because of any compiler memory barrier, but rather because the with_ipc_buffer calls are separated, the ipc buffer mut borrow is shortened and so the compiler can't assume nonaliasing and do stuff like fusing memory accesses.

May 24 '25 05:05 sledgehammervampire

i don't think it's because of any compiler memory barrier, but rather because the with_ipc_buffer calls are separated, the ipc buffer mut borrow is shortened and so the compiler can't assume nonaliasing and do stuff like fusing memory accesses.

This is correct. Runtime checks (fixed in https://github.com/seL4/rust-sel4/pull/257) enforce the property that the scopes of the mutable references to the IPC buffer obtained via with_ipc_buffer don't overlap with IPC operation calls, which is necessary because the kernel uses the IPC buffer during those calls.

May 24 '25 07:05 nspin

However, thanks to insight gleaned from this discussion, I now realize that the handling of the IPC buffer in the implementation of the IPC operations in the sel4-sys crate is likely unsound. I'm struggling to find Rust reference material detailed enough about the cases at hand to know for sure.

The problem is that IPC invocations and syscalls take a mutable reference to the IPC buffer and do the asm!("svc 0", ...) within the scope of that mutable reference. One solution would be to use a hack like this to inform the compiler that the inline assembly fragment uses the IPC buffer, to a similar effect as that of passing a mutable reference to an FFI function:

(compiler explorer link)

// bad
#[unsafe(no_mangle)]
pub fn f(x: &mut i32) -> i32 {
    *x = 42;
    unsafe {
        asm!("svc 0");
    }
    *x
}

// good
#[unsafe(no_mangle)]
pub fn g(x: &mut i32) -> i32 {
    *x = 42;
    unsafe {
        asm!("svc 0 // {hack}", hack = in(reg) x);
    }
    *x
}

However, this is obviously a hack and, while the runtime cost is about as small as could possibly be, it's not a totally satisfying solution.

Another option would be to change the interface between the sel4 and sel4-sys crates to use some sort of opaque reference type rather than a &mut seL4_IPCBuffer, to have more control over the scope of actual mut refs. One downside of such an approach is that it would complicate the corners of the sel4 crate API that support using an explicit IPC buffer, which is useful for runtimes that don't have access to TLS.

May 24 '25 07:05 nspin