tinygo icon indicating copy to clipboard operation
tinygo copied to clipboard

fix(rp2350): add software spinlocks

Open mikesmitty opened this issue 4 months ago • 8 comments

As it turns out, the RP2350 has hardware spinlocks that can be unlocked by writes to nearby addresses, the lower spinlocks currently in use in TinyGo happen to be unlocked by writes to the doorbell interrupt registers used to signal between cores, very possibly leading to some unexpected unlocks. This was not corrected in the A3 or A4 steppings and instead software spinlocks are used by default on RP2350 in pico-sdk:

RP2350 Warning. Due to erratum RP2350-E2, writes to new SIO registers above an offset of +0x180 alias the spinlocks, causing spurious lock releases. This SDK by default use atomic memory accesses to implement the hardware_sync_spin_lock API, as a workaround on RP2350 A2.

https://www.raspberrypi.com/documentation/pico-sdk/hardware.html#group_hardware_sync

Another thing I noticed in that section, they also always disable interrupts when the spinlocks are being held, we may want to do the same:

[...] the default spinlock related methods here (e.g. spin_lock_blocking) always disable interrupts while the lock is held as use by IRQ handlers and user code is common/desirable, and spin locks are only expected to be held for brief periods.

These are the software spinlock macros ported over: https://github.com/raspberrypi/pico-sdk/blob/2.2.0/src/rp2_common/hardware_sync_spin_lock/include/hardware/sync/spin_lock.h#L112 https://github.com/raspberrypi/pico-sdk/blob/2.2.0/src/rp2_common/hardware_sync_spin_lock/include/hardware/sync/spin_lock.h#L197

mikesmitty avatar Sep 10 '25 01:09 mikesmitty

Oh, whoops. My go fmt extension has been flaking out on me. Will have the missing rp2040 imports updated in a moment. Here's the lock/unlock disassembled output with inlining disabled:

   0x10001178 <(*runtime.spinLock).Lock+0>:     cbz     r0, 0x10001192 <(*runtime.spinLock).Lock+26>
   0x1000117a <(*runtime.spinLock).Lock+2>:     ldaexb  r2, [r0]
   0x1000117e <(*runtime.spinLock).Lock+6>:     movs    r1, #1
   0x10001180 <(*runtime.spinLock).Lock+8>:     cmp     r2, #0
   0x10001182 <(*runtime.spinLock).Lock+10>:    bne.n   0x1000117a <(*runtime.spinLock).Lock+2>
   0x10001184 <(*runtime.spinLock).Lock+12>:    strexb  r2, r1, [r0]
   0x10001188 <(*runtime.spinLock).Lock+16>:    cmp     r2, #0
   0x1000118a <(*runtime.spinLock).Lock+18>:    bne.n   0x1000117a <(*runtime.spinLock).Lock+2>
   0x1000118c <(*runtime.spinLock).Lock+20>:    dmb     sy
   0x10001190 <(*runtime.spinLock).Lock+24>:    bx      lr
   0x10001192 <(*runtime.spinLock).Lock+26>:    bl      0x100013f4 <runtime.nilPanic>
   0x100013e4 <(*runtime.spinLock).Unlock+0>:   cbz     r0, 0x100013ee <(*runtime.spinLock).Unlock+10>
   0x100013e6 <(*runtime.spinLock).Unlock+2>:   movs    r1, #0
   0x100013e8 <(*runtime.spinLock).Unlock+4>:   stlb    r1, [r0]
   0x100013ec <(*runtime.spinLock).Unlock+8>:   bx      lr
   0x100013ee <(*runtime.spinLock).Unlock+10>:  bl      0x100013f4 <runtime.nilPanic>

mikesmitty avatar Sep 10 '25 02:09 mikesmitty

I support the switch to atomic instructions, but if you need something that works right away, RP2350-E2 mentions that some spinlocks are not affected:

The following SIO spinlocks can be used normally because they don’t alias with writable registers: 5, 6, 7, 10, 11, and 18 through 31. Some of the other lock addresses may be used safely depending on which of the high-addressed SIO registers are in use. Locks 18 through 24 alias with some read-only TMDS encoder registers, which is safe as only writes are mis-decoded.

Thank you for tracking those down. I figured there would probably be some, but I couldn't find them

mikesmitty avatar Sep 10 '25 23:09 mikesmitty

After doing some testing, I think this might not actually be locking properly. Going to do some more thorough testing

mikesmitty avatar Sep 18 '25 01:09 mikesmitty

Yes, this looks good. @mikesmitty can you apply the changes proposed by @eliasnaur?

Also, you may want to rebase on the dev branch to hopefully fix the assert-test-linux CI failure.

aykevl avatar Sep 23 '25 12:09 aykevl

Sure, here's the rebase. I haven't had time to test or diagnose it yet, been stuck working on a project that's consumed all my free time, but I'm pretty sure it's not functioning properly as confirmed by that CI failure

mikesmitty avatar Sep 24 '25 12:09 mikesmitty

Hmm, no I guess it is actually locking. I'm not sure what's happening with that CI test though

mikesmitty avatar Sep 24 '25 13:09 mikesmitty

Restarting CI, these are probably flaky tests.

aykevl avatar Sep 29 '25 13:09 aykevl

Ok, finally coming back to take a look at this again. I still have some reservations about the stability of this one. I'm getting mysterious panics in various runtime locations that I haven't tracked down yet. I suspect some race conditions are going on that I haven't narrowed down yet

For example:

[2025-10-03T13:28:06.865Z, +005191ms] panipanic: runtime  ee[2025-10-03T13:28:06.877Z, +005203ms] rrrroorr  aatt  0x100036ed0x10003a5d: unsafe.Slice/String: len out of range

mikesmitty avatar Oct 03 '25 13:10 mikesmitty