tinygo icon indicating copy to clipboard operation
tinygo copied to clipboard

runtime: check for heap allocations inside interrupts

Open aykevl opened this issue 4 years ago • 5 comments

This is unsafe and should never be done. Therefore, add a safety check here. It adds around 32 bytes of extra code but I think that's worth the extra safety, as the alternative is risking memory corruption in very rare circumstances that are therefore near-impossible to debug.

Unfortunately, at the moment there are a number of cases where heap allocations occur in interrupts. For example:

  • Scanning in the bluetooth package has a heap allocation in an interrupt here.
  • At least the Circuit Play Express stops working with this change, probably other samd based boards too (untested so far).

The BBC micro:bit and the HiFive1 rev B appear to be working with the few things I tested.

This is still a draft not because the PR isn't ready, but rather because the PR surfaces too many bugs that need to be fixed before this PR can be merged.

aykevl avatar Oct 25 '20 14:10 aykevl

Bugs related to this can be most easily detected using GDB:

~/src/github.com/tinygo-org/bluetooth$ tinygo gdb -target=pca10056-s140v7 -opt=1 ./examples/scanner/
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
[...]
target halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x00000a80 msp: 0x20000400
(gdb) b runtime.runtimePanic
Breakpoint 1 at 0x2729a: file /home/ayke/src/github.com/tinygo-org/tinygo/src/runtime/panic.go, line 18.
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, runtime.runtimePanic (msg=...) at /home/ayke/src/github.com/tinygo-org/tinygo/src/runtime/panic.go:18
18		printstring("panic: runtime error: ")
(gdb) bt
#0  runtime.runtimePanic (msg=...) at /home/ayke/src/github.com/tinygo-org/tinygo/src/runtime/panic.go:18
#1  0x00027390 in runtime.alloc (size=7) at /home/ayke/src/github.com/tinygo-org/tinygo/src/runtime/gc_conservative.go:224
#2  0x00028132 in tinygo.org/x/bluetooth.handleEvent () at /home/ayke/src/github.com/tinygo-org/bluetooth/adapter_nrf528xx.go:110
#3  0x000286be in (*tinygo.org/x/bluetooth.Adapter).Enable$1 (arg0=...) at /home/ayke/src/github.com/tinygo-org/bluetooth/adapter_sd.go:77
#4  0x000289d0 in SWI2_EGU2_IRQHandler ()
#5  <signal handler called>
#6  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) quit

The above code shows that this bug is at github.com/tinygo-org/bluetooth/adapter_nrf528xx.go:110.

aykevl avatar Oct 25 '20 14:10 aykevl

I have tracked some heap allocation bugs here: https://github.com/tinygo-org/tinygo/issues/1461 These bugs must be fixed before this PR can be merged.

aykevl avatar Oct 27 '20 12:10 aykevl

Tests finally pass! But as indicated in #1461, there is one bug left in the bluetooth package that needs to be fixed before this can be merged.

aykevl avatar Apr 24 '21 00:04 aykevl

How are you searching for these allocs? Just making a best-guess via source code inspection? Setting -gc=none? Or are you using the new -print-allocs flag?

ardnew avatar May 05 '21 07:05 ardnew

Basically a combination of all three. Initially I found a few by running some programs on this branch, setting a breakpoint on runtime.nilpanic (or similar) and running until it crashed. At that point, the stack trace shows where the allocation happens. Later I've mostly used -print-allocs=machine.

aykevl avatar May 05 '21 13:05 aykevl

LGTM - however, it seems to be finding some 'suspect' code in RP2040 :)

There's a bunch of 'allocs' in USB code - so I guess there.

kenbell avatar Oct 05 '22 14:10 kenbell

With #3219 and #3218 I'm able to get basic examples working on RP2040 with this change.

kenbell avatar Oct 05 '22 15:10 kenbell

@aykevl - is it worth rebasing this? The RP2040 interrupt fix and the USB fixes are now in dev - are we at the point this should be merged to dev and we work on fixing up any issues that come up?

kenbell avatar Oct 14 '22 14:10 kenbell

There's a bunch of 'allocs' in USB code - so I guess there.

This was the single-most influential factor that drove the design of my USB package:

     :bulb: Perform no heap allocations anywhere.

Of course, it turns out I was wildly naive to have any expectation of succeeding at this using Go or TinyGo. There's a few reasons for that, but none of which have a clear, best solution. Two off the top of my head:

  1. The Go language spec does not define any syntax, context, or semantics that allows a programmer to express via source code whether some object should exist on the stack or escape to the heap.
    • Luckily, the Go profiling tools are quite robust and insightful. It usually isn't too difficult to use them to find the true culprit and iterate alternative design patterns to get what you're wanting (which usually means writing Rust code in the Go language).
  2. Much of the standard library operates on heaps internally (as they should, because it is often the correct choice, given the available resources), and much of its API depends on heap-allocated objects
    • Like 1 above, you will lean on the memory profiler to identify the culprit. Unfortunately, the only solution here is to simply avoid the standard library and roll your own code. Of course, it would be very wise to use the standard library source as a starting point. You can usually discard 80% of what's there, as their problem spaces tend to be much larger and require support for many edge cases and pathological conditions.

With that being said, one thing my USB package does well (since it certainly didn't end up being very readable or maintainable) is it has very, very few allocations. I would highly recommend — whomever will be working the RP2040 USB allocs — to consider consulting that package's architecture:

  1. All of the descriptors, device class allocations, and other things which most USB implementations construct at runtime are all instead allocated at compile-time.
  2. Build tags help prune any large structures that are not applicable to the configured device class, number of ports, etc.

ardnew avatar Oct 15 '22 09:10 ardnew

Rebased. If this passes all tests, I think now is a good time to merge it so that we have plenty of time to shake out the remaining bugs before the release (and also to give a stronger incentive to actually fix those bugs as subtly broken code will now panic loudly).

aykevl avatar Feb 15 '23 17:02 aykevl

Time to get this in so we can fix whatever problems it exposes. Thanks for working on it @aykevl and to @ardnew and @kenbell for looking it over.

deadprogram avatar Feb 19 '23 10:02 deadprogram