riscv icon indicating copy to clipboard operation
riscv copied to clipboard

`riscv-rt`: example multicore.rs could cause HART to loop forever because of race condition

Open MLFlexer opened this issue 10 months ago • 4 comments

In the multicore.rs example, the other HARTs wait for an interrupt before entering main. However it seems as though there could be a race condition if HART 0 sets the interrupt flag to 1 in https://github.com/rust-embedded/riscv/blob/95cfb90c66fbfc0d66975a4235e1aeab96df9c39/riscv-rt/examples/multi_core.rs#L47 , and then HART 1 sets it to 0 in https://github.com/rust-embedded/riscv/blob/95cfb90c66fbfc0d66975a4235e1aeab96df9c39/riscv-rt/examples/multi_core.rs#L19.

This would make the interrupt flag be 0, when HART 1 is entering the loop in https://github.com/rust-embedded/riscv/blob/95cfb90c66fbfc0d66975a4235e1aeab96df9c39/riscv-rt/examples/multi_core.rs#L24-L29, and remain 0, because HART 0 has already set the flag, but it was overwritten by HART 1.

MLFlexer avatar Feb 28 '25 10:02 MLFlexer

If one wanted to introduce a global Mutex or Atomic (for lock-free algorithms), you could probably assure mutually-exclusive access to the portions that write to the IPI address.

However, as I understand the example, it is assuming this is a multicore platform, with a boot HART of 0.

It sleeps the other cores until they receive an interrupt written to the IPI address.

All other HARTS would clear their IPI flag, and be stuck in the inner wfi loop until they receive another interrupt (breaking out if they receive an msoft interrupt).

So, if HART 0 sets the IPI for HART 1 before HART 1 enters _mp_hook, it would not break out of the inner loop because of the wfi call. Is that what you're saying?

However, aren't all of the HARTS going to be asleep before being woken by an IPI call? If you are assuming that HARTS were all woken by some previous software that hands control to this example, then I think you are correct, there is a potential race condition. And as mentioned above, you could use a Mutex or lock-free algorithm to control access to peripheral memory.

Feel free to make the example more robust against a potential race condition!

rmsyn avatar Apr 08 '25 23:04 rmsyn

In any case, the _mp_hook functions must be implemented in assembly, as it is executed before initializing RAM. Thus, we need to rework the example.

romancardenas avatar Jun 09 '25 09:06 romancardenas

Given the memory addresses, it looks like the example is using a CLINT peripheral. The address '02000000 + 4 * n' corresponds to the msip register of the nth HART.

It is important to know that, when _mp_hook is executed, RAM is still uninitialized. Therefore, using mutexes would probably fail. Also, using Rust at this point is unsound, unfortunately. Additionally, I guess that each PAC should define its custom _mp_hook function to adapt to its particularities.

While @MLFlexer is correct about a potential race condition, this is highly unlikely, as HART 0 will take a considerable amount of time until the RAM is initialized, jumps to main, and triggers the software interrupts (compared to the other HARTS, which only clear their MSIP and wait for interrupts).

In any case, I agree that the multi-processor boot deserves a rework. I would leave it to a following iteration.

romancardenas avatar Jun 10 '25 10:06 romancardenas

While @MLFlexer is correct about a potential race condition, this is highly unlikely, as HART 0 will take a considerable amount of time until the RAM is initialized, jumps to main, and triggers the software interrupts (compared to the other HARTS, which only clear their MSIP and wait for interrupts).

I mostly agree that for the simplified example, maybe introducing a memory barrier of some form is too much. However, at least a comment along the lines of "this example is potentially vulnerable to a race condition, see these resources for how to protect against it..."

As a small experiment, I wrote some pseudo-code for what the lock-free algorithm could look like: compiler-explorer lock-free example.

The asm output could be used for an extended example that includes what a concurrency-safe _mp_hook could look like.

core::sync::atomic::atomic_load + core::sync::atomic::atomic_store look to be the most involved to port over. The assembly could be put into a global_asm block or naked function to get around the "Rust isn't sound in _mp_hook" problem.

In any case, I agree that the multi-processor boot deserves a rework. I would leave it to a following iteration.

Definitely, if you get to it first, I'd be happy to review / work on it together. Or, if @MLFlexer wants to have a go at it :)

rmsyn avatar Jun 11 '25 01:06 rmsyn