lilos
lilos copied to clipboard
Could lilos_handoff::Push take &mut Option<T> to be cancel safe?
This way the pointer could be directly provided so the T isn't taken out unless the future succeeds.
I believe while this fixes the issue with dropping the future returned by Push::push
, API consumers cannot avoid being weakly cancel-safe themselves. To use such an API, the API consumer would need to have space for the T
in its own future (or call stack). As soon as this future is cancelled, the T
is dropped instead of being processed.
As I understand the problem, we want to perform two operations: generating some value and sending this value to a different task. With the current API is that we always need to generate the value before sending it. Ideally, we would only generate the value when we can be sure to succeed in sending it. lilos::spsc::Permit
solves this neatly, as we can obtain the permit before generating the value. Once we obtain the permit, we are assured that there is space in the queue and sending the value will succeed without having to await. Therefore, we can generate and send the value in a single invocation of poll.
I have to think about that. I wonder it it could actually be done soundly.. if the pop end is waiting, we get a permit. Then the pop future is cancelled and forgotten (no Drop), but the permit would still seem valid. I think this one-step approach is the only valid option.
That said, maybe an API that takes make_value: impl FnOnce() -> T
would work?
Okay so I clearly need to fix my Github notification settings. Hello from like seven months later!
On @korrat's point - I tried rewriting the handoff
API to take a closure, but it turns out to be incredibly subtle -- since the closure needs to be called from the taker's context it wasn't clear that it could be done. Or at least, I couldn't figure out how to do it. I'm definitely open to insights.
@mattfbacon - I think your push_from
operation is clever. I need to do some testing in my codebases to see what the generated code looks like. I feel like it addresses 90% of my cancel correctness concerns with handoff
-- while @korrat is correct that it doesn't save the caller from doing a potentially destructive or expensive operation to produce the T
, it does at least let the caller "put it back" on cancellation of the pusher future.
The fact that some_handoff.push_from(&mut None)
compiles is a bit unfortunate. Your decision to treat it as a no-op in #11 is one of several ways to address that, and has the advantage of generating very little code compared to a panic. I think your approach may be reasonable.
After thinking about @korrat's analysis in #14, I'm less confident about push_from
, because it isn't sufficient to make the transfer truly cancel-correct. The sequence of events in question:
- Task A parks at
pop
waiting for rendezvous. - Task B executes
push_from
, which immediately completes because A was waiting. B has lost access to the item it's sending. - Before resolving the
pop
future, Task A cancels it. This could be a result of e.g. popping from the handoff within aselect_biased!
that chooses an earlier branch and bails. The item is now lost with no indication.
There are various small changes I could imagine to address one piece or the other -- for instance, we could maintain a "data lost" flag so that there's at least some indication that this happened, though I don't see how I would use that in practice to fix an application.
I am currently leaning toward concluding that handoff
, while super useful, is inherently cancel-unsafe, and can only be used in cases where data loss is tolerable (rare) or where both sides are careful not to cancel the push/pop futures (more common).
I've also tested the associated PR on some of my firmware, and it increases code size by a small but measurable amount. I'm investigating the causes. I suspect that the compiler is unable to entirely eliminate the operations on the Option, which is odd since it appears to be totally inlining push_from
.
Given (1) the realization that a minimum-copy handoff
may be inherently cancel-unsafe, and (2) the increase in lines-of-code and bytes-in-flash, I'm now leaning toward not merging this PR in its current form.
I'm definitely open to alternatives, or to being convinced otherwise.
Ah, good point, it's cancel-safe in the case where the push_from happens before the pop, but not vice versa, because of the PopWait variant. Maybe I'm missing something, but why does the popper need to provide the location in this variant? Why couldn't the popper wait until the PushWait condition occurs, then take it from that?