rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Don't allow unwinding from Drop impls

Open Amanieu opened this issue 2 years ago • 89 comments

This RFC proposes to change this behavior to always abort if a panic attempts to escape a drop, even if this drop was invoked as part of normal function execution (as opposed to unwinding which already aborts with a double-panic).

Previous discussion: https://github.com/rust-lang/lang-team/issues/97

Rendered

FCP comment

Amanieu avatar Jul 05 '22 01:07 Amanieu

I think it would also be good to add an explicit section to the RFC about how users will find out about the change in behavior, just to document that going in - for example, release blog post and compat note would be a good start in my eyes, but we could also consider some softer change (e.g., adjusting the panic logging to say "this will abort starting in X release) to phase this in. AIUI, there's not really much static checking we can do, so that's probably about our limit...

Mark-Simulacrum avatar Jul 05 '22 02:07 Mark-Simulacrum

I'm a huge fan of this proposal. When writing the rustonomicon this was one of the few major details of rust I couldn't really get a straight answer from anyone about. From my perspective it existed in the same grey area as unwinding through FFI, where we just lacked institutional expertise on the subtleties of unwinding corner cases and had punted on it.

This is why I never really adequately documented it. The current behaviour is very subtle and not something even stdlib devs ever had institutional knowledge or consideration of.

Writing exception-safe unsafe code is already enough of a headache, partially because it comes up rarely enough that it's hard to stay actively vigilant for. I think like, BinaryHeap is one of the only places in liballoc that has to do some explicit exception-safety fuckery because it needs to invoke the user-defined comparator repeatedly while doing mutations that put it in a transiently invalid state. All those stars lining up is actually surprisingly rare! Most of the time there are distinct lookup, allocate, then modify phases; so code is trivially exception-safe (assuming your own code doesn't panic, which collections generally only do during the lookup and allocations, not the final modification). Having less places where this rare problem can surprisingly come up, especially from user code, seems like a great thing.

We've never had a good story about this situation and at this point I think we have ample evidence it should just be banished to the shadow realm for being more trouble than it's worth. Especially since the stdlib already has the established precedent of Buf and File both just swallowing errors on Drop. Those cases were going to be my one hand-wring for this change, so it turns out I have nothing to complain about! (My memory was that they panicked, but honestly I never really gave it much consideration).

Gankra avatar Jul 05 '22 02:07 Gankra

I'd like to rehash my earlier arguments against this change, for those starting from this issue. I'd like to begin with some statements that I think everyone here agrees with:

  • Disregarding code size, this change only assists unsafe code. All safe code remains safe in the presence of unwinding panics.
  • Authors of some applications, such as certain web servers, would strongly prefer that threads unwind on critical failure rather than abort the entire process.
  • Some real-world Rust code can panic on drop today. (For instance, a Drop impl that calls println!() can panic if it fails to write.)
  • Today, a normally-unwinding panic from a Drop impl aborts the process if and only if either -C panic=abort is set, or the thread is already unwinding. With this change, a panic from a Drop impl will abort unconditionally.
  • As a consequence, some code that used to unwind the thread will now abort the process, to the (perhaps minor) detriment of certain web servers and some other applications.

Thus, I believe that since this is a silent change that will harm some current use cases, we should really show that the benefit of this change is greater than the cost, and that there are neither more beneficial nor less costly alternatives. I argue that such an alternative exists: keep panic-in-drop=unwind as the default, and provide better tools for writing drop-sensitive unsafe code.

Currently, the RFC twice compares this change to C++11's noexcept destructors, and only briefly mentions the differences in the "Prior Art" section. However, these differences are very important. In C++, each function parameter and local variable runs its destructor at the end of its scope. There is no way to prevent this, short of performing certain hacks with a union. There is also no way to run the destructor explicitly; even a move operation creates a new value and leaves a sentinel which must be destructed. But in Rust, destruction is determined by ownership semantics. If we pass a variable into a function parameter by value, the function (or data structure it is attached to) will become responsible for dropping it. This allows us to explicitly drop any value at any point in time, by passing it to the aptly-named drop() function.

The ability of Rust code to explicitly control dropping is quite relevant here. In C++, destruction is always invisible and uncontrollable (except if a value is placed into an std::optional which is reset), and so it makes a lot of sense not to mandate exception-safety surrounding it. In Rust, we can make drops clearly visible in unsafe code by using drop() and similar constructions, so the rationale doesn't apply nearly as absolutely.

The problem is, unsafe code today can easily implicitly drop values without realizing there's an issue; just look at the catch_unwind footgun (rust-lang/rust#86027). I believe that the best and most direct way to mitigate this is to provide better tools, both in the compiler and in the standard library, to catch these footguns instead of leaving them for the unsafe code's author to figure out. Importantly, we should clearly recommend using these tools in our docs and our books. To give an example, one crude but effective tool, from a suggestion by @danielhenrymantilla, would be an allow-by-default lint that triggers on every variable that implicitly calls Drop::drop(), and suggests explicitly drop()ping the variable. We could advise library authors to activate this lint on any panic-sensitive code that handles trait objects or user data types.


There are a few different variations on how the panic-in-drop option is to be handled. To summarize my position on each:

  • -Z panic-in-drop=unwind by default, -Z panic-in-drop=abort allowed: The current status quo. Acceptable, given additional tools to help unsafe code authors.
  • -C panic-in-drop=unwind by default, -C panic-in-drop=abort allowed: My preferred option, given additional tools. Users could manually choose panic-in-drop=abort for the improved code size and compile times, without needing the fully-fledged panic=abort.
  • -Z panic-in-drop=abort by default, -Z panic-in-drop=unwind allowed: This proposal, which I am against. It provides the code-size and compile-time improvements, but introduces a silent change that restricts some use cases with no stable workaround.
  • -C panic-in-drop=abort by default, -C panic-in-drop=unwind allowed: I am strongly against this. It would make unwinding drops possible, but it would fracture the package ecosystem: new unsafe container libraries could easily ignore support for the now-special panic-on-drop=unwind configuration, making the option very dangerous to use.

LegionMammal978 avatar Jul 05 '22 04:07 LegionMammal978

I would like to echo that the RFC as currently written puts too much weight on the code size / compilation benefits of the change. These are a strong argument for making the option available, but distract from the correctness argument for making it mandatory. Presenting this in the main text which is concerned with justifying changing the semantics of existing code serves to distinctly weaken the justification, since the argued position is not necessary for a primary argued benefit.

If the change is to be made, it should be justified on the unwind safety benefits alone, and the other benefits are just a very convenient bonus.

There is additionally a third undiscussed option: panic-in-drop=catch. Effectively, this automatically converts types to the recommended panic-in-drop=abort behavior of attempting as much cleanup as possible but swallowing the error and continuing on if something goes wrong. Of course, this is not without its own annoying problems (e.g. a field relying on the parent destructor to put it back into a state that is safe to drop... though imho that should definitely be using ManuallyDrop already anyway), but it does still have the distinct advantage of providing the nounwind benefit while avoiding introducing new implicit aborts into existing code.

I also need to point out what I think should be addressed, or at least acknowledged as a drawback: it is impossible to drop a structure where dropping its fields may panic without risking a panic-in-drop=abort. Even if you use ManuallyDrop to drop fields within your drop impl rather than after, the field's drop glue will cause an abort, so if you have legacy code that e.g. calls eprintln! in its destructor[^1], you have no recourse other than to forget to drop that field and leak its resources.

[^1]: ... I have some code to go update to be more resilient against double panics.

A final really silly option: add ptr::drop_in_place_maybe_unwind. This would explicitly generate and call panic-in-drop=unwind style drop glue, allowing you to lift ~~panicking~~aborting destructors into a panicking close automatically.


If the compiler had some sort of #[no_panic] support that could be used for drop impls, I would be less worried about a change along these lines, because it would be at least somewhat possible to verify that drops don't cause an abort in normal execution.

CAD97 avatar Jul 05 '22 05:07 CAD97

I'll just note that the performance and code size changes are pretty inconsequential, for a break in backwards compatibility. 3-5% decrease in code size is irrelevant for all but the most resource-constrained applications, which will likely use panic=abort anyway.

Also, while the change would possibly simplify the correctness of unsafe code, it would compromise the correctness of the application at large, which may rely on properly handling Drop panics. This is a big issue, both because the application is much harder to audit due to its size and lack of compiler-enforced panic boundaries, and because the application is usually written by less experienced users, including total newbies, while unsafe code is relegated to the experts.

The change in behaviour of Drop would be a breaking change, and a very insidious one, since it is entirely silent and hard to audit. That's the worst kind of breakage that can be introduced. It is also well-known that Rust currently allow panic in Drop, so many crates may rely on it. scopeguard is one mentioned example, Another example is the drop_bomb crate, which is used in 53 dependencies, including rust-analyzer. I'm sure there are Drop impls in the wild which use the same pattern without explicitly using those dependencies.

afetisov avatar Jul 05 '22 12:07 afetisov

An attribute that specific unsafe types can put on their own drop impl if they need to be absolutely sure that no unwind can escape their drop is one thing. This is too much.

Lokathor avatar Jul 05 '22 12:07 Lokathor

I feel this is a throwing baby with the bathwater situation. I'd prefer exploration of alternative designs. For example:

  • Add ability to forbid all panics in a function or a block of code. For unsafe code panics are dangerous, but panic in Drop is only a one of many cases. There could be unexpected panics lurking in Deref and overloaded operators. unsafe needs to be aware of all of them, so this solution is insufficient.

  • catch_unwind is a terrible gotcha that definitely needs addressing, but it also is a very unique case. There is no other panic-catching function, so there is no other place in Rust where code that clearly doesn't want a panic may still get one. I feel that a solution narrowly targeted specifically at catch_unwind would be sufficient to solve this gotcha, e.g. forbid panic in drop only in types used with catch_unwind (and dyn if necessary) or mark only callers of catch_unwind with panic=abort. There could be a safer catch_unwind variant that requires return type to be Copy.

  • For applications concerned about code size overhead there is panic=abort already. If a more granular solution is needed, this could be offered with an explicit function attribute.

kornelski avatar Jul 05 '22 16:07 kornelski

I would like to echo that the RFC as currently written puts too much weight on the code size / compilation benefits of the change. These are a strong argument for making the option available, but distract from the correctness argument for making it mandatory. Presenting this in the main text which is concerned with justifying changing the semantics of existing code serves to distinctly weaken the justification, since the argued position is not necessary for a primary argued benefit.

I personally find the code size argument more convincing since it is backed by hard numbers which show the improvement. The exception safety argument is inherently subjective since it can just be countered by saying "well it's your fault for writing buggy unsafe code". It's not impossible to write exception-safe Rust code, just difficult.

Also, while the change would possibly simplify the correctness of unsafe code, it would compromise the correctness of the application at large, which may rely on properly handling Drop panics. This is a big issue, both because the application is much harder to audit due to its size and lack of compiler-enforced panic boundaries, and because the application is usually written by less experienced users, including total newbies, while unsafe code is relegated to the experts.

I find this argument quite strange. Does any existing code actually rely on panics for normal operation? In general, panics will only happen in the case of buggy code, when an unexpected error occurs. Additionally, the proportion of code executed by Drop impls is tiny compared to non-drop code: assuming an even distribution of potential unexpected panics in a codebase, they will almost always come from non-drop code.

There's additionally another intriguing middle ground which should be listed as an alternative: drop glue / drop_in_place can still unwind, and a field's drop glue unwinding continues to unwind and drop other fields (siblings and of parent containing types), but unwind is implicitly caught and turned into an abort only when the unwind exits from the drop glue stack into a calling function.

This is something that has been suggested a lot. If I understand correctly, what you are suggesting is that unwinding out of a drop aborts only for implicit drops but not explicit ones. The problem is: if you already know which drop calls are going to potentially panic then you could just convert that code to use a close method like I suggested in the example. I don't see much benefit in adding compiler support for something which can easily be done in library code.

In order to justify changing the default, the RFC should justify the claim that RAII drop glue is an issue, and not just explicit drop_in_place calls unwinding.

The RAII drop glue calls drop_in_place, so making that function nounwind will affect both explicit and implicit drops.

I would much rather see (and am drafting) a proposal providing APIs for unsafe code to control unwinds more carefully; namely some combination of:

The problem with these is that none of them give the same code size & compilation time benefits as the approach proposed in this RFC. This is something that I feel quite strongly about: a 10% speedup in compilation time on a crate like syn is huge.

Amanieu avatar Jul 06 '22 00:07 Amanieu

I personally find the code size argument more convincing since it is backed by hard numbers which show the improvement. The exception safety argument is inherently subjective since it can just be countered by saying "well it's your fault for writing buggy unsafe code". It's not impossible to write exception-safe Rust code, just difficult.

Given that the Rust standard library (ostensibly a codebase with a much higher level of rigor and review than most crates) has gotten this wrong on several occasions, I'm doubtful that most developers would be able to disarm this footgun, or even recognize that this exists.

carbotaniuman avatar Jul 06 '22 01:07 carbotaniuman

Does any existing code actually rely on panics for normal operation? In general, panics will only happen in the case of buggy code, when an unexpected error occurs.

Panics may be used to implement non-local exception-based control flow, even though it is clearly not an intended usage.

But to the point, while I doubt that panics could occur during normal operation of most non-buggy code, the correctness argument is about the graceful handling of bugs. Code may rely on being able to catch the panics for correctness. A long-running web server, or an embedded device, or a security-critical firmware, or an OS kernel absolutely should not panic. End-users may rely on being able to catch any possible panics with catch_unwind, and currently this includes single panics occurring in Drop glue. Note that the panic strategy is chosen by the end-user during compilation. If one sets -C panic=unwind, they can reasonably expect that they will be able to handle panics.

With the proposed changes, a relatively benign panic for some rare edge case will be escalated to a total crash, which can lead to DoS or worse.

Additionally, the proportion of code executed by Drop impls is tiny compared to non-drop code: assuming an even distribution of potential unexpected panics in a codebase, they will almost always come from non-drop code.

The argument about distribution of panics is quite weird, I don't know what to make of it. Since Drop must deal with disposing potentially fallible resources, I find it likely that Drop impls on average have more panics, but this is also a baseless assumption.

afetisov avatar Jul 06 '22 01:07 afetisov

I personally find the code size argument more convincing since it is backed by hard numbers which show the improvement.

But the code size can be achieved by opt-in or opt-out behavior. It is not a justification for making panic-in-drop=abort the only option. It can serve as a supporting argument to making the change, but it cannot be the primary argument, because it does not justify making this breaking change to behavior.

I find this argument quite strange. Does any existing code actually rely on panics for normal operation?

Yes. For example, rust-analyzer uses unwinds for cancellation. Unwinding is the semantically correct way to handle cancellation, and is essentially what cancelling an async function and dropping the witness table looks like (just without thread::panicking()).

This is not unique to just rust-analyzer; this is a design choice of salsa.

It has generally been understood that while you SHOULD NOT require unwinding support in a library (and MUST support unwinding), you MAY rely on unwinding being available if you control the application compilation.

I agree that likely no code is deliberately relying on unwinding out of drop for application correctness, but it is the case that long-running programs intend to catch any task unwind, and turning unwinds that would have been recovered from into aborts is adding a new DoS vector to such servers.

This is something that has been suggested a lot. If I understand correctly, what you are suggesting is that unwinding out of a drop aborts only for implicit drops but not explicit ones. The problem is: if you already know which drop calls are going to potentially panic then you could just convert that code to use a close method like I suggested in the example. I don't see much benefit in adding compiler support for something which can easily be done in library code.

Then I can take this argument and apply it unchanged to making drop_in_place nounwind as well! You already know where things are going to potentially panic (every drop_in_place call) and can convert the code to use a close method.

Unless I'm sorely mistaken, the justification for making this a mandatory breaking change without opt-out is that it is too difficult to know which drop calls are going to potentially panic, and the proposed resolution is to make no drop calls panic. I fail to see how this alternative fails at the goal, as every point that can unwind is clearly marked. And if you do want the behavior where drop_in_place aborts unwinds as well, drop_in_place has a reminder on it to "remember that this can unwind from panicking drop glue! If this is undesirable, wrap the call in a drop_unwind or call drop_in_place_no_unwind instead."

I'd even be happy if we took the usual "safer by default" approach (as with copy/copy_nonoverlapping, sort/sort_unstable) and made drop_in_place abort unwinds but provided (and drop glue used) drop_in_place_allow_unwinds. The thesis I am defending is to provide safeguards but with the choice and tools to allow unwinds.

The problem with these is that none of them give the same code size & compilation time benefits as the approach proposed in this RFC. This is something that I feel quite strongly about: a 10% speedup in compilation time on a crate like syn is huge.

I will again also reiterate that the compile-time and code-size improvements are not an argument to make this mandatory. They are a very good argument to make it an option. They may even support making this the default. They would definitely be an argument against adding panic-in-drop=unwind if the existing behavior was panic-in-drop=abort.

Swift's lack of deinit throws is reasonable prior art to point to for not providing fallible destruction. However, Swift has a large advantage here: they don't have any unwinding. Your only choice is throws or abort. This means that you have a built-in guarantee that a Swift destructor doesn't contain an uncaught exception/unwind. In fact, in Swift, indexing an Array out of bounds aborts. (Most collections return an optional value from subscripts, but Array doesn't.) Basically any developer error which would panic! in Rust is an abort in Swift.

If anything, this is an argument in support of -Cpanic=abort becoming the default instead of -Cpanic=unwind. I would support such a change. (This may run into similar issues around encouraging crates which don't support unwinding, but tests continuing to be -Cpanic=unwind I think serves as a strong force against this.)

In order to justify changing the default, the RFC should justify the claim that RAII drop glue is an issue, and not just explicit drop_in_place calls unwinding.

The RAII drop glue calls drop_in_place, so making that function nounwind will affect both explicit and implicit drops.

Yes, but I'm not arguing that point. It is a perfectly viable strategy for the unwind-to-abort shim to be added in the drop glue rather than in drop_in_place.


Given that the Rust standard library (ostensibly a codebase with a much higher level of rigor and review than most crates) has gotten this wrong on several occasions, I'm doubtful that most developers would be able to disarm this footgun, or even recognize that this exists.

Nobody (including std devs) really understood the issue of a caught unwind's drop unwinding until recently. (This discussion started almost immediately after it was discovered.) We can attack this with documentation on catch_unwind and any other APIs for managing unwinds.

Other than this specific case, std has generally had no issue with unwind safety, because it is something you so rarely have to think about that std basically only encounters it for BTreeMap and where catch_unwind unwinds.

CAD97 avatar Jul 06 '22 02:07 CAD97

Code may rely on being able to catch the panics for correctness. A long-running web server, or an embedded device, or a security-critical firmware, or an OS kernel absolutely should not panic.

Also many TUI applications depend on unwinding to avoid getting stuck in raw mode when panicking. Very few register an atexit callback and signal handlers to exit raw mode on crashes.

bjorn3 avatar Jul 06 '22 07:07 bjorn3

In order to justify changing the default, the RFC should justify the claim that RAII drop glue is an issue, and not just explicit drop_in_place calls unwinding. I would much rather see (and am drafting) a proposal providing APIs for unsafe code to control unwinds more carefully

I wrote up an outline for a related alternative in an IRLO thread a while back. The idea is to have a catch_unwind_v2<F, R>(...) -> UnwindResult<R>, where UnwindResult is #[must_use] and has many of the methods you listed, acting as a sort of by-value builder API. To summarize what I wrote there:

use std::{any::Any, panic::UnwindSafe};

pub fn catch_unwind_v2<F: FnOnce() -> R + UnwindSafe, R>(f: F) -> UnwindResult<R>;

#[must_use]
pub struct UnwindResult<T> { ... }

impl<T> UnwindResult<T> {
    // Unwinding functions
    pub fn into_result(self) -> Result<T, Box<dyn Any + Send>>;
    pub fn drop_err(self) -> Option<T>;
    pub fn resume_unwind(self) -> T;

    // Unwind-free functions
    pub fn inspect_err<F>(self, f: F) -> UnwindResult<T>
    where
        F: FnOnce(&(dyn Any + Send + 'static));

    pub fn inspect_err_mut<F>(mut self, f: F) -> UnwindResult<T>
    where
        F: FnOnce(&mut (dyn Any + Send + 'static));

    pub fn forget_err(self) -> Option<T>;
    pub fn unwrap_or_abort(self) -> T;
    pub fn try_drop_err(self) -> Result<T, UnwindResult<()>>;
    pub fn drop_err_or_forget(self) -> Option<T>;
    pub fn drop_err_or_abort(self) -> Option<T>;

    // Add `new()`, `map()`, etc. as desired
}

I've been doing some surveying of real-world catch_unwind usage (via GitHub Code Search) to evaluate this; I'll probably post my findings in a few days.

LegionMammal978 avatar Jul 06 '22 21:07 LegionMammal978

After a bit of research, I wanna say I don't particularly find the comparisons to C++ noexcept particularly compelling. In C++, the basic exception guarantee is as follows:

If the function throws an exception, the program is in a valid state. No resources are leaked, and all objects' invariants are intact.

Rust code provides neither of these guarantees, even in idiomatic code. (It is only better than having no exception guarantee that memory corruption iins not allowed). Leak amplification is used throughout the Rust stdlib (unlike C++).

In addition, unwinds do not require object invariants to be preserved, which is the entire reason why UnwindSafe is a thing, but issues like rust-lang/rust#89832 aren't really considered worth fixing. Observing broken object invariants in Rust is safe, for better or for worse.

Given that objects with throwing destructors were already forbidden in STL containers, the change in C++ likely impacted a lot less code than these proposed Rust changes. And I think given the difference in semantics for unwinding vs exceptions, this change is too big a hammer.

carbotaniuman avatar Jul 07 '22 05:07 carbotaniuman

Yeah, UnwindSafe is kinda "fake" in the sense that it doesn't relate to unsafe at all.

Lokathor avatar Jul 07 '22 06:07 Lokathor

The RFC text (in the Downsides) ought IMO to discuss the "drop bomb" pattern, sometimes used to simulate linear types - or rather, to avoid shipping code which treats a particular type as nonlinear. E.g. https://github.com/rust-lang/rust/pull/78802#issuecomment-1177948974

This is relevant here, because drop bombs precisely involve deliberately panicking inside drop, in circumstances where the author would probably have made very different design choices if they expected that such a panic would be an immediately irrecoverable abort.

ijackson avatar Jul 07 '22 17:07 ijackson

Drop bombs are usually being used to cover up for things you would want to be compile-time errors, so it's not at all clear to me that you would care about whether the bomb unwinds or aborts -- if linear (true must-use) type lovers had their druthers, the program wouldn't have even been runnable!

Gankra avatar Jul 07 '22 17:07 Gankra

I think there's been a huge amount of focus on the optimization possibilities here, as opposed to the correctness argument. The optimization possibilities would justify creating an option, but not changing the default. The correctness issue justifies changing the default.

I think the correctness argument is the primary reason to do this; that it enables great optimizations is a bonus.

Handling objects that panic in their drop implementations is incredibly complex, and as noted, even the standard library has gotten it wrong. I think it's reasonable for us to evaluate whether we expect libraries to handle this, or whether we want to proscribe it entirely so that libraries don't have to handle it.

I am quite sympathetic to cases like long-running servers that want to catch panics and turn them into 500 Internal Server Error or similar, and keep running. However:

turning unwinds that would have been recovered from into aborts is adding a new DoS vector to such servers

Such servers already have multiple DoS vectors:

  • If you panic while holding a lock, you'll poison the lock, and everything touching that lock is unlikely to be able to recover.
  • If you encounter another panic while panicking, you'll already abort.

This proposal is not making all panics abort; I agree that that would be a huge change to the default. This proposal is making panics in drop abort. And it's already possible to make servers robust against that, by adding close() methods and similar.

joshtriplett avatar Jul 12 '22 16:07 joshtriplett

"you already can't recover from lock poisoning" was obviously a gap in the design, and closing that gap is already in progress: https://github.com/rust-lang/rust/issues/96469 So we're actually already working to improve the DoS situation, and the RFC as written would worsen things.

There's three alternatives given, including "opt-in to unwinding", but there isn't an alternative given for "opt-out of unwinding". Since particular unsafe code is the code that can go wrong with a panic in drop (not all unsafe code does), why don't we put the burden on the particular instance of unwind sensitive unsafe code to opt in to having a drop that will abort if an unwind tries to escape from it? Call it #[about_on_unwind_escape] or whatever name. That seems the least disruptive to existing code, while allowing unsafe code the tools it needs to be correct.

Lokathor avatar Jul 12 '22 16:07 Lokathor

@Amanieu We discussed this in today's @rust-lang/lang meeting.

We felt like all of the arguments have been made at this point, and now we need to weigh the tradeoffs between possibilities:

  • Do we feel that we gain more robustness by support catch_unwind on panics from Drop, at the expense of the lower robustness of all the library code that doesn't handle this?
  • Or do we feel that we gain more robustness by eliminating this case as something libraries need to handle, at the expense of the lower robustness of code that wants to handle all possible panics including panics in Drop?

However, before we do that, there are a few changes we'd like to see:

  • The correctness argument should be at the forefront, as that's the primary justification for doing this. Performance is a bonus, and not a reason to change the default.
  • There should be some kind of hook that runs when aborting, to give code an opportunity to do cleanups. (The same hook should run on all panics with panic=abort.)

joshtriplett avatar Jul 12 '22 17:07 joshtriplett

The correctness argument should be at the forefront, as that's the primary justification for doing this.

Once the correctness argument is at the forefront, I think it would be great for the RFC to also consider (or at least mention) alternatives that would aid the user in writing correct code, such as:

  1. a less error-prone catch_unwind API.
  2. a lint against any implicit method calls.
  3. some form of #[panics(never)] attribute.

I'm no expert on the compiler, but I feel like (2) should be reasonably easy to implement.

Implementing (3) in a useful manner would likely require a lot of additional design work[^1].

[^1]: Last time I though about this I ended up with multiple additional qualifiers like #[panics(never(verify_unstable)], #[panics(never(assume_unsafe)] or #[panics(never(type_dependent)]. Some could be safely combined (like #[panics(never(verify_unstable,type_dependent)]) but others should be linted against (like #[panics(never(assume_unsafe,type_dependent)] unless the annotated function is unsafe). And those qualifiers still don't cover all generic use cases.

TimNN avatar Jul 12 '22 18:07 TimNN

There should be some kind of hook that runs when aborting, to give code an opportunity to do cleanups. (The same hook should run on all panics with panic=abort.)

This already exists: an unwind guard will call the panic handler with a PanicInfo that return false for can_unwind (#92988). std's panic handler will call the panic hook as normal but then abort instead of unwinding.

Amanieu avatar Jul 12 '22 18:07 Amanieu

a less error-prone catch_unwind API.

FWIW, the catch_unwind API problem is kind of already paliated with https://github.com/rust-lang/rust/pull/99032.

Then, with @LegionMammal978's proposal, and potentially with also something like ::unwind_safe's API, we'll be able to offer rich and more powerful APIs for future Rust code (I'm thus looking forward to your counter-proposal, @CAD97!), without needing to break language semantics.


I'll reiterate that the tool that would help catch unwind-related footguns,

  • (regarding which the drop ones have taken a gigantic attention with this proposal, but .-derefs, {:?}-formatting, + and-the-like operations, .awaits, etc., are other sneaky places where unwinds can happen)

is to have an allow-by-default lint against API usage that may unwind (or implicit drops if that is still deemed to be that devilish). This is definitely information accessible to the MIR output, so such a lint ought not to be that difficult to write.

So, to go back to:

Handling objects that panic in their drop implementations is incredibly complex, and as noted, even the standard library has gotten it wrong. I think it's reasonable for us to evaluate whether we expect libraries to handle this, or whether we want to proscribe it entirely so that libraries don't have to handle it.

I think that before we ask that question, especially considering breaking semantics of the language, that we should first give a try to these helpful tools.

In summary:

  1. merging https://github.com/rust-lang/rust/pull/99032 to get catch_unwind's footgun tackled, and maybe also implementing @LegionMammal978's and @CAD97's proposal for an even better API in that fashion;
  2. featuring a lint or some other tool to help identify unwind points;
  3. as well as blessing scopeguard and/or unwind_safe patterns to be part of std, since with those tools writing unwind-safe code is quite easy, to be honest (thanks to nested unwinds aborting).

Once all this is done, if the situation is still deemed problematic w.r.t. unwinds in implicit drops or with other sneaky ops, only then a PR to tackle implicit drops (but no other sneaky ops, and also the innocent explicit drop bystander), imho, ought to be considered.

danielhenrymantilla avatar Jul 12 '22 21:07 danielhenrymantilla

This already exists: an unwind guard will call the panic handler with a PanicInfo that return false for can_unwind (#92988). std's panic handler will call the panic hook as normal but then abort instead of unwinding.

I may be overlooking it, but is there an explanation of that in the RFC? It'd help to make sure people know what options they'd have.

joshtriplett avatar Jul 12 '22 21:07 joshtriplett

One thing that popped up while writing my counterproposal: would forbidding unwinding from Drop::drop be able to subsume the panic-in-unwind immediate abort? At least internally it looks like this behavior has not been formally guaranteed.

This sounds extremely scary, but a strong mitigating factor is that existing unwind-to-abort shims still work, because they work by unwinding inside of a destructor. I think it at least reasonably possible that nobody has written code of the form

impl Drop for UnwindsAreAborts {
    fn drop(&mut self) {
        // if unwinding, the panic aborts
        // if not unwinding, catch the unwind and continue
        catch_unwind(|| panic!());
    }
}

or

if panicking() {
    panic!();
    // that aborted, so we can do whatever
    unreachable_unchecked();
}

If we lose the ability to recover from panics escaping drops, but gain the ability to recover from neatly nested panics within drops, even while already unwinding, then it's not just a loss in abort-resilience anymore, but a tradeoff.

Unfortunately, for the case of foreign unwinds, we still require an immediate abort, or else it is UB at the ABI level, as on e.g. the Itanium psABI it is undefined behavior to nest foreign unwinds from different runtimes. (Disclaimer: I do not know what is required to support nested unwinds within a given unwind provider.)

CAD97 avatar Jul 14 '22 06:07 CAD97

One thing that popped up while writing my counterproposal: would forbidding unwinding from Drop::drop be able to subsume the panic-in-unwind immediate abort? At least internally it looks like this behavior has not been formally guaranteed.

This is mostly orthogonal to this RFC: it's just a matter of removing the panic count and instead relying on the abort guards to catch double panics.

One issue is that std::thread::panicking only checks the panic count but is unable to tell whether a Drop impl is being executed as part of normal execution or is invoked from an unwind landing pad:

fn foo() {
    defer! {
        catch_unwind(|| {
            // std::thread::panicking() returns true here.
        });
    }
    panic!();
}

I am personally not a big fan of the panicking function and would like to see it deprecated, but unfortunately it is quite widely used.

Unfortunately, for the case of foreign unwinds, we still require an immediate abort, or else it is UB at the ABI level, as on e.g. the Itanium psABI it is undefined behavior to nest foreign unwinds from different runtimes. (Disclaimer: I do not know what is required to support nested unwinds within a given unwind provider.)

Although it is technically UB in the ABI, here's what happens in practice:

  • In C++: catching a Rust panic while a C++ exception is active[^1] aborts (calls std::terminate to be precise).
  • In C++: catching a C++ exception while a Rust panic is active just works.
  • In Rust: catching a C++ exception just works, even if a Rust panic is active[^2].
  • In Rust: catching a Rust panic just works, even if a C++ exception is active.

[^1]: "Active" here means that we are in a catch block for this exception or we are in a destructor that is being run by the unwinder. [^2]: Rust doesn't have catch blocks (the catch block is hidden inside catch_unwind) so in practice this just means that we are currently running destructors from the unwinder.

Amanieu avatar Jul 14 '22 10:07 Amanieu

[Allowing contained panics during an unwind] is mostly orthogonal to this RFC

While implementation-wise I agree that this is orthogonal, it is imho strictly-happens-after this RFC, because of the current reliance[^1] on panic-in-unwind producing an abort for soundness.

[^1]: Note to self: replace any instances of this I see with panic!(); abort();. Perhaps another use case for panic_abort!?

Because of this, supporting such is a supporting argument for requiring unwind-in-drop=abort. I would personally predicate accepting this RFC on also migrating to allowing neatly nested unwinds. Making this possible is a primary benefit of this RFC's change.

In fact, making unwinds properly nest is perhaps the benefit of this RFC. The surprise factor of drop unwinding may be directly the result of an unwind escaping drop being an improperly nested unwind.

The reason for this is that the change is then actually a net increase in robustness against panics. It means that unwinding through otherwise normal code is no longer a potential abort; it makes relying on unwinding better. Currently, using unwinds for control flow is strongly discouraged, because it has the potential to abort even if you control -Cpanic=unwind. Applications can and do rely on unwinding for correctness, however[^2], so improving the resilience of doing so at the cost of unwinds aborting when escaping Drop::drop (and they would've been forced to abort if we were unwinding) starts to seem like a reasonable trade-off.

[^2]: The previous example is salsa implementing synchronous cancellation via unwinds. I argue that unwinding is in fact the correct way to handle cancellation; async cancellation is also identical to an unwind except for the fact that the unwind is stackless/reified, so std::thread::panicking() returns false and starting a real unwind doesn't immediately abort (without this RFC).

I'm one of the most vocal critics of this RFC, but allowing neatly nested panics has made me significantly less resolute in my position. I'm seriously considering adding a more fleshed out than usual alternative in my RFC for "hey, this is an alternative I don't have a concrete rationale to prefer between" for the world where we allow nested unwinds and make Drop::drop abort escaping unwinds.

Combining both resolves my primary concern with either individually, and only leaves the implicit concern/risk of breaking users relying on ossified but formally nonguaranteed behavior (with a silent upgrade to UB).


I think I have a proposed implementation which could potentially get us both nested unwinds and unwinding dtors. However, it relies on specific details of the Itanium unwinding ABI to be implementable — namely, the two-phase lookup — in order to preserve immediate aborts when an unwind is improperly nested. The details are currently being recorded in the alternatives section of my counter-RFC, but I will need help to show it is implementable natively in the different native unwinding schemes we use. The behavior I think can be shimmed on top of any unwinding mechanism (incl. a monadic transform) the same way the panic_count is currently handled, but it would be very nice to remove the additional panic_count overhead and just use the existing unwinding runtime if possible.

CAD97 avatar Jul 14 '22 18:07 CAD97

I've updated the RFC to try to capture all of the feedback and proposed alternatives in this thread. However there has been a lot of discussion so I may have missed something, do let me know.

Amanieu avatar Jul 15 '22 22:07 Amanieu

Well, someone may have suggested "Allow individual Drop impls to opt-in to unwinding", but I suggested roughly the exact opposite: an attribute for Drops that need to opt-in to aborting. With the ability to opt-in to an abort-on-panic for a specific drop, particular unsafe code can guard itself, while most code can continue to unwind without trouble.

Lokathor avatar Jul 15 '22 23:07 Lokathor

To provide an extra data point, I've found some widespread FFI code that intentionally relies on calling resume_unwind() within Drop::drop(). In openssl v0.10.41 (and its fork boring v2.0.0), the associated functions {EcKey, Rsa, PKey}::private_key_from_pem_callback() and PKey::private_key_from_pkcs8_callback() all go through a wrapper invoke_passwd_cb(). This wrapper calls catch_unwind(), storing the payload in the CallbackState. The CallbackState then calls resume_unwind() once it is dropped (source):

pub struct CallbackState<F> {
    /// The user callback. Taken out of the `Option` when called.
    cb: Option<F>,
    /// If the callback panics, we place the panic object here, to be re-thrown once OpenSSL
    /// returns.
    panic: Option<Box<dyn Any + Send + 'static>>,
}

/* ... */

impl<F> Drop for CallbackState<F> {
    fn drop(&mut self) {
        if let Some(panic) = self.panic.take() {
            panic::resume_unwind(panic);
        }
    }
}

LegionMammal978 avatar Jul 16 '22 01:07 LegionMammal978