leaky-bucket icon indicating copy to clipboard operation
leaky-bucket copied to clipboard

Remove some unsafe

Open conradludgate opened this issue 3 months ago • 2 comments

At Neon, we recently had a panic triggered in tokio RawTask Waker functions, specifically about mismatched ref counts, it appeared in the callstack of leaky-bucket. We didn't find anything wrong in the tokio code, but given leaky-bucket was in the stack frame, I started looking here.

While necessary for intrusive linked lists, this code has a lot of unsafe. I ran it through miri (I had to enable the start_paused flag with tokio mocked time to do so) but didn't find anything. However, in the process, I realised there was some low-hanging unsafe code to clean up.

Main changes:

  1. pin-project-lite is already a dependency through tokio, so we might as well use it instead of writing manual unsafe projections.
  2. The code with the UnsafeCell was written using manual pointer offsets, even through &mut. UnsafeCell has a safe get_mut method which we can just use instead. This also fixed the pointer accesses in the &self versions to use addr_of instead of pointer offsets.

The first commit in this PR does have a noticeable change in a few of the docs. I am open to reverting those changes. However, in the tests/ it makes the tests deterministic. I had trouble running the tests because they would take a couple milliseconds more than expected and fail. They are also slow tests. Using mocked tokio time makes them instant and also deterministic.

Unfortunately I haven't yet found the source of the bug, but I thought I would upstream these changes regardless. I'm also exploring a significant rewrite to remove almost all unsafe by using https://docs.rs/pin-list/latest/pin_list/ as the linkedlist impl, and not write our own, but it's more than I can pull off with the time I have right now. I started on that work in a branch if you're curious https://github.com/conradludgate/leaky-bucket/commit/bd42c2ed081e1b5dd59eec51a1b4671644e92126

conradludgate avatar Mar 20 '24 08:03 conradludgate

I adopted test timers in #19 so that miri can be enabled in CI.

Using addr_of! instead of pointer arithmetic is a clear improvement since MSRV has been bumped (repr(C) could also be dropped for non-public types as a result). I'll have to take a closer look to make sure using pin-project-lite doesn't do the wrong thing when projecting out specific fields which might or might not be accessible at the time.

udoprog avatar Mar 20 '24 12:03 udoprog

At Neon, we recently had a panic triggered in tokio RawTask Waker functions, specifically about mismatched ref counts, it appeared in the callstack of leaky-bucket. We didn't find anything wrong in the tokio code, but given leaky-bucket was in the stack frame, I started looking here.

About your original issue, you'll want to look into where the waker being passed to leaky-bucket is constructed. I don't think this crate does anything unsafe which would allow us to clone or use a waker so incorrectly that it would trip up its internal reference count (if that's how it's implemented). We simply call the waker functions we're being provided as-needed through its safe API, which is also why we show up in your call stack.

udoprog avatar Mar 20 '24 13:03 udoprog