rp-hal Multicore Lockout

I was experimenting with flash writing using rp2040-flash and littlefs2. Everything was running smoothly until I wanted to try multicore with it.

As I understand it is because both cores run directly from the flash (unless you use #[link_section = ".data.ram_func"]) and as one core litteraly unplug the flash the other instantly crash and cannot recover when the flash is plugged back.

To solve this issue, there is these functions: multicore_lockout provided by the C SDK, that can pause the other core in a safe mode until the lock is unlock (like a SpinLock but for the entire core). However, I cannot find any implementation of this in the crate, neither any pratical way to solve the problem using other ways.

Apr 20 '24 23:04 Captainfl4me

I dig a bit and wanted to try implementing a SIO interrupt like they did with the lockout function. For that, on core 0 I did:

    vt.init(&mut pac.PPB);
    vt.register_handler(pac::Interrupt::SIO_IRQ_PROC0 as usize, test);
    unsafe {
        vt.activate(&mut pac.PPB);
        pac::NVIC::unmask(pac::Interrupt::SIO_IRQ_PROC0);
    }

Here is my function test:

#[inline(always)]
#[link_section = ".data.ram_func"]
extern "C" fn test() {
...
}

The function is triggered, however it seems like the core never come back from it and the execution stop. (I removed all flash function for my tests)

Apr 21 '24 01:04 Captainfl4me

It's not obvious to me what's going wrong here. #[inline(always)] in combination with a ram function looks strange, but I don't think that's the issue here as when it's called from an interrupt, it can't be inlined anyway.

Can you publish a small example reproducing the issue into a git repo? Perhaps I can see more looking at the full code.

Apr 26 '24 13:04 jannic

Hey,

Thanks, for looking at this issue! I'm going to write a small example to isolate the behavior I'm trying to achieve so hopefully it will makes more sense to you.

I've done some test and almost figure out how to achieve what I wanted. Basically, the problem with the interrupts occurs when I was trying to use fancy calls inside the interrupt (like rebuild builtin WS2812 object for debugging) but I think it was conflicting with object inside the main function causing corruption and unwanted behavior in the main function (basically crashing it when the interrupt returned). Now that I've a cleaner interrupt with only required code inside it seems to work. The only problem left is that even if the interrupt function runs from RAM when interacting with the flash using core1, core0 seems to have a hard time recovering from the interrupt.

I'll send you the repos link with the code soon. (it will makes more sense)

Apr 26 '24 14:04 Captainfl4me

Here is the example code: https://github.com/Captainfl4me/rust-2-rp/blob/master/multicore-lockout-irq/src/main.rs

Apr 26 '24 14:04 Captainfl4me

While I do not yet know what exactly causes your code to fail, some ideas where I'd investigate further:

your write_flash function calls several other functions. Why the function itself is in RAM, there's no guarantee these function calls don't use flash. While functions like rom_data::flash_flush_cache(); in the end call some code in ROM, there are some intermediate steps inside rp2040_hal that may or may not be inlined into your RAM function. In general, it's difficult to guarantee that some rust code is entirely located in RAM and doesn't call any code in flash. Because of this, in https://github.com/jannic/rp2040-flash/blob/master/src/lib.rs#L174, these calls are manually ported to assembly.
The same is true for fifo_wait_in_ram. I'm not even sure if the link_section applies to the closure inside the critical section. Here, I'd put the contents of the critical section into a separate function, put that function into RAM, and manually verify that it's compiled down to some simple code that's not calling any external function.

Apr 26 '24 16:04 jannic

Uh, even worse, it looks like rustc doesn't inline the sio read/read_blocking calls:

20000000: b5d0          push    {r4, r6, r7, lr}
20000002: af02          add     r7, sp, #0x8
20000004: b082          sub     sp, #0x8
20000006: f000 f827     bl      0x20000058 <__Thumbv6MABSLongThunk__critical_section_1_0_acquire> @ imm = #0x4e
2000000a: 4604          mov     r4, r0
2000000c: 480a          ldr     r0, [pc, #0x28]         @ 0x20000038 <__sdata+0x38>
2000000e: 2101          movs    r1, #0x1
20000010: 7001          strb    r1, [r0]
20000012: a801          add     r0, sp, #0x4
20000014: f000 f826     bl      0x20000064 <__Thumbv6MABSLongThunk_rp2040_hal::sio::SioFifo::read::h41c4759b5756f495> @ imm = #0x4c
20000018: 2801          cmp     r0, #0x1
2000001a: d108          bne     0x2000002e <__sdata+0x2e> @ imm = #0x10
2000001c: 2969          cmp     r1, #0x69
2000001e: d106          bne     0x2000002e <__sdata+0x2e> @ imm = #0xc
20000020: a801          add     r0, sp, #0x4
20000022: f000 f825     bl      0x20000070 <__Thumbv6MABSLongThunk_rp2040_hal::sio::SioFifo::read_blocking::he8005b2c0e2eed67> @ imm = #0x4a
20000026: 2121          movs    r1, #0x21
20000028: 0249          lsls    r1, r1, #0x9
2000002a: 4288          cmp     r0, r1
2000002c: d1f8          bne     0x20000020 <__sdata+0x20> @ imm = #-0x10
2000002e: 4620          mov     r0, r4
20000030: f000 f824     bl      0x2000007c <__Thumbv6MABSLongThunk__critical_section_1_0_release> @ imm = #0x48
20000034: b002          add     sp, #0x8
20000036: bdd0          pop     {r4, r6, r7, pc}
20000038: ee 40 00 20   .word   0x200040ee

So your fifo_wait_in_ram is actually jumping to flash, which won't work if your flash gets disabled.

(Generated with cargo objdump --release -- -d)

Apr 26 '24 17:04 jannic

Thank you so much for the insights! I did know about objdump I think it will be very useful to better understand what is actually going on.

For the write_flash function I replicated the code from rp2040_flash library but without the writing parts because I didn't think it was actually needed. I just use the Hal functions but as you say maybe that's also causing issue.

I'll take a look and let you know!

Apr 27 '24 08:04 Captainfl4me

Well you're right after a close inspection of the dump file it seems like all the function calls (rom_data::, critical_section_acquire, critical_section_release, read, read_blocking) actually live in ROM (section .text) therefore causing a jump to the ROM during interrupt.

Not sure if there is a way to force rust to inline the functions ? So I guess I'll try writing my RAM function in asm directly so that it is inline (like they did with rp2040_flash)

Apr 27 '24 10:04 Captainfl4me

I refactor the code to have the following (I push the changes if you want to take a look on the repos),

#[inline(never)]
#[link_section = ".data.ram_func"]
extern "C" fn fifo_wait_in_ram() {
    // critical_section::with(|_| {
    if fifo_is_read_ready() {
        if fifo_read() != LOCKOUT_CORE {
            return;
        }

        loop {
            if fifo_is_read_ready() && fifo_read() == UNLOCKOUT_CORE {
                break;
            }
        }
    }

    // Drain the FIFO before returning
    while fifo_is_read_ready() {
        let _ = fifo_read();
    }
    // });
}

#[link_section = ".data.ram_func"]
pub fn fifo_is_read_ready() -> bool {
    let sio = unsafe { &(*pac::SIO::ptr()) };
    sio.fifo_st.read().vld().bit_is_set()
}

#[link_section = ".data.ram_func"]
pub fn fifo_read() -> u32 {
    let sio = unsafe { &(*pac::SIO::ptr()) };
    sio.fifo_rd.read().bits()
}

According to the objdump, there is no more call to functions living in ROM. (I also update the core1 to use rp2040_flash so that I don't have to call rom_data). However, even if, without the flash logic (if I remove the flash_range_program) the interrupt is working, when I add the flash_range_program the program freeze after triggering the interrupt (not sure when, if it is during or after)

Apr 27 '24 13:04 Captainfl4me

Okay my bad it was just a problem with the way I was passing the buffer to the flash_range_program function. Now, everything seems to work!!

Code here

Apr 27 '24 13:04 Captainfl4me

Glad to hear that it works now, and thanks for posting the working code!

Apr 27 '24 17:04 jannic

rp-hal rp-hal copied to clipboard

Multicore Lockout

rp-hal
rp-hal copied to clipboard