rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

unsafe attributes

Open RalfJung opened this issue 1 year ago • 88 comments

Consider some attributes 'unsafe', so that they must only be used like this:

#[unsafe(no_mangle)]

Rendered Prior discussion on IRLO

RalfJung avatar Oct 11 '22 10:10 RalfJung

I don't think that this RFC offers much for library developers.

Normally, when you write a library and use an unsafe block, you can know for sure (because of a runtime check or because of a type system trick, or for some other reason) that your unsafe operation is justified at that point in the code. That is actually why you write the unsafe block. If you can't be sure that everything is alright, the you do not write that unsafe block. Perhaps you make the entire function an unsafe fn and pass the obligation on to the caller. It doesn't matter. What matters is that you do not use unsafe blocks unless you have the situational knowledge to justify why the contents of the block won't cause UB in this situation. I think we all agree on this position.

However, say that a library needs to export a function under a particular name. The library author needs to use either no_mangle or export_name. These are unsafe attributes. The unsafety here is that if a given symbol is defined more than once among all the object files being linked together to form an executable, it's UB. But the library author does not control what object files get linked into the final executable, and so they absolutely should not write "SAFETY: there is no other global function of this name" next to #[unsafe(no_mangle)]. That is not a promise that the library author can keep.

The same goes for link_section. The library author does not decide on the linker script used to generate the executable, so they can't make a promise about the validity of using a special link section on a particular fn or static.

If you write an unsafe block without the proper pre-conditions, the code is considered to be unsound code. "please don't publish that on crates.io", and so on. This RFC forces library authors to write unsafe(...) around attribute usage that they are incapable of validating the safety pre-conditions for.

The RFC makes unsafe attribute usage more "grep-able", but not actually less unsafe or less error prone. This doesn't give a library author the ability to communicate to a user, "I'm doing this unusual thing, here's what you need to do on your end to make sure things work out". The RFC doesn't prevent UB at all. It just follows the absolute letter of the rust law that says "if there's UB then someone must have used unsafe", and now we can point to the literal ascii string having appeared in some file somewhere.

Lokathor avatar Oct 14 '22 23:10 Lokathor

"More greppable" is largely the point. While the specific attributes (no_mangle, link_section, link, link_name) may be grepped for manually, It also provides a general mechanism to communicate unsafety of arbitrary other attributes (including proc macros, custom attributes and external tool attributes). It's also much easier to grep for unsafe than to remember the full list of all attributes with unsafe behaviour. It also adds the concept of unsafety for attributes, which allows tools to depend on it (e.g. forbid usage of these attributes in the #![forbid(unsafe_code)] lint, or propagate unsafety through macros).

With regards to usage by library authors, you should think about it like unsafe fn rather than unsafe { } block. I.e. you, as the author, specify preconditions which all users must satisfy, even though you can't enforce them yourself. It's the same situation as an extern fn: you provide an unsafe API, but you can't know or make any guarantees about its consumers.

But I think @Lokathor 's objection can be rephrased in the following way: with unsafe fn, there are really two concepts at play. The function may be unsafe to use, enforcing preconditions on its consumers, or unsafe to implement, requiring extra care from its author and allowing the usage of unsafe calls inside. Many people think that it was a mistake to merge those two concepts into a single unsafe fn. We have the same issue with unsafe attributes: do they mean that the API consumers must uphold extra invariants, or that the function author promises that some invariants are satisfied? With #[unsafe(no_mangle)] (and all other proposed existing unsafe attributes) it's the former. With #[derive(unsafe(DangerousTrait))] it's likely the latter.

afetisov avatar Oct 15 '22 00:10 afetisov

I absolutely would not say that about unsafe fn, I think it is factually wrong.

Lokathor avatar Oct 15 '22 00:10 Lokathor

It sounds like the missing step there is the ability to mark your crate as unsafe

/// SAFETY: We require users of this crate to promise there is no other global `write` symbol
#[unsafe(no_mangle)]
pub fn write() {}

This provides something similar to item-level unsafe {} blocks, but we don't have a corresponding item-level unsafe fn declaration to wrap them.

Nemo157 avatar Oct 15 '22 09:10 Nemo157

We have the same issue with unsafe attributes: do they mean that the API consumers must uphold extra invariants, or that the function author promises that some invariants are satisfied? With #[unsafe(no_mangle)] (and all other proposed existing unsafe attributes) it's the former. With #[derive(unsafe(DangerousTrait))] it's likely the latter.

Both of these examples are the latter. You are discharging the unsafety of the attribute or the derive. There is no way existing or with this RFC to do the former.

Nemo157 avatar Oct 15 '22 09:10 Nemo157

The other thing is that if the compiler handled it then we could make no_mangle a safe attribute, but we're just not doing that.

Lokathor avatar Oct 15 '22 14:10 Lokathor

The other thing is that if the compiler handled it then we could make no_mangle a safe attribute, but we're just not doing that.

That’s quite impossible if you’re statically linking any C code or other precompiled objects into the executable, since the compiler doesn’t know what names that code might define or depend on. And on some platforms (Linux), symbol definitions can leak even across dynamically linked libraries.

comex avatar Oct 16 '22 01:10 comex

Well the compiler could scan at least all objects it knows go into the linking, even if they're written in non-rust.

But if symbols can leak in even after the initial linking then not only is there no hope to make no_mangle fully safe, but there's clearly also no hope you could ever discharge your safety obligation within a library. So I think that supports my point that just putting the string "unsafe" next to no_mangle is not itself improving things by much.

Idea for a separate RFC: Perhaps cargo could generate a list of all the unmangled symbols all your deps are bringing in so that you can check the list?

Lokathor avatar Oct 16 '22 05:10 Lokathor

However, say that a library needs to export a function under a particular name. The library author needs to use either no_mangle or export_name. These are unsafe attributes. The unsafety here is that if a given symbol is defined more than once among all the object files being linked together to form an executable, it's UB. But the library author does not control what object files get linked into the final executable, and so they absolutely should not write "SAFETY: there is no other global function of this name" next to #[unsafe(no_mangle)]. That is not a promise that the library author can keep.

That is just fundamental in the nature of these attributes -- they can only be checked on the whole-program "post-linking" level. There is no way to soundly use any of them in a library. The RFC doesn't change that, it just makes this existing fact more clear. I have amended the text to clarify that.

RalfJung avatar Oct 22 '22 10:10 RalfJung

Well the compiler could scan at least all objects it knows go into the linking, even if they're written in non-rust.

It would not cover all cases, so you would still need to mark them as unsafe.

I think such a scan would be orthogonal to the RFC: it could be performed (or not) regardless of whether we get unsafe attributes.

no hope to make no_mangle fully safe [...] putting the string "unsafe" next to no_mangle is not itself improving things by much.

Yeah, but since there is no way to make it happen in the general case, at least this one should be marked as such.

Of course, that doesn't mean you cannot add new, safe attributes in the future for similar use cases.

Perhaps cargo could generate a list of all the unmangled symbols all your deps are bringing in so that you can check the list?

The generated docs of rustdoc could be another good place to put them.

ojeda avatar Oct 22 '22 12:10 ojeda

@rfcbot merge

joshtriplett avatar Jan 19 '23 12:01 joshtriplett

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @joshtriplett
  • [x] @nikomatsakis
  • [x] @pnkfelix
  • [x] @scottmcm
  • [x] @tmandry

Concerns:

  • ~~change-syntax-to-drop-parentheses~~ resolved by https://github.com/rust-lang/rfcs/pull/3325#issuecomment-1458714974
  • ~~maybe-make-this-part-of-next-edition~~ resolved by https://github.com/rust-lang/rfcs/pull/3325#issuecomment-1458690311
  • ~~syntax-not-ideal~~ resolved by https://github.com/rust-lang/rfcs/pull/3325#issuecomment-2017689806

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. See this document for info about what commands tagged team members can give me.

rfcbot avatar Jan 19 '23 12:01 rfcbot

@rfcbot reviewed

However that could make it tricky to consistently support non-builtin unsafe attributes in the future, so the RFC proposes to not do that yet.

Not a blocking concern since this is forward-compatible, but I don't see how supporting safe attributes makes it difficult to support non-builtin unsafe attributes in the future. I suppose nesting could be problematic if we support nesting under arbitrary unknown attributes.

tmandry avatar Jan 26 '23 03:01 tmandry

I really don't know what is hard or easy with proc macro attributes and things like that, so after the first version of this PR (which was very liberal with where unsafe could be used) got feedback pointing out that could be tricky, I made it super restrictive.

RalfJung avatar Jan 29 '23 20:01 RalfJung

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot avatar Feb 21 '23 19:02 rfcbot

@rfcbot concern maybe-make-this-part-of-next-edition

During T-lang triage, I posed the question of whether we should not push to land this change across all Rust editions, and instead have it only become a change on e.g. Rust 2024.

If we do want this to be coupled to the edition boundary, then it would be good for the RFC to spell out how it is coupled to the edition. (E.g. maybe it is always warn-by-default for Rust editions < 2024, but is a hard error to leave out the unsafe-discharge for edition >= 2024.)

pnkfelix avatar Feb 21 '23 19:02 pnkfelix

@rfcbot approved

I'm in favor of the idea.

nikomatsakis avatar Feb 28 '23 19:02 nikomatsakis

@rfcbot reviewed

nikomatsakis avatar Feb 28 '23 19:02 nikomatsakis

One minor point: it's unclear from the RFC whether multiple unsafe attributes can be used within a single unsafe(...) block.

ie. #[unsafe(no_mangle, link_section="blah")]

The RFC says you can't use safe attributes within it, but doesn't specifically forbid using multiple unsafe attributes. Nor does it explicitly allow that or have an example of doing that.

Diggsey avatar Mar 01 '23 00:03 Diggsey

What exactly happens with an attribute and path grammar with this change? #[unsafe(...)] doesn't fit into the current path grammar because it's a keyword.

Does unsafe become a path segment keyword (like self or super), so the attribute grammar remains to be #[PATH(...)]? Or it's a new grammar production for attributes specifically (or rather "attribute items inside" [...])? Is unsafe supported inside cfg_attr - #[cfg_attr(TRUE, unsafe(foo))]? Does #[r#unsafe(...)] produce the same effect as #[unsafe(...)]?

petrochenkov avatar Mar 01 '23 07:03 petrochenkov

Also, how is unsafe expanded?

Is it a (built-in) procedural macro, or an active built-in attribute like cfg/cfg_attr, or an inert attribute "expanded" e.g. during lowering to HIR (then the alternatives with macros inside unsafe(...) become impossible)? Is it expanded out of order like cfg/cfg_attr or in left-to-right order like everything else?

Some observable behavior depends on the answers for all of the syntax/expansion questions above.

petrochenkov avatar Mar 01 '23 07:03 petrochenkov

I think my preferred solution would be to find some non-keyword identifier (e.g. unsafe_attr) to avoid the syntactic can of worms.

As for expansion, my preference would be an active built-in attribute expanded in left-to-right order. (But perhaps we should start with a procedural macro for maximum future-compatibility.)

petrochenkov avatar Mar 01 '23 07:03 petrochenkov

One minor point: it's unclear from the RFC whether multiple unsafe attributes can be used within a single unsafe(...) block.

The intention was to not allow multiple attributes, but I agree that isn't clear. I will amend the RFC.


@petrochenkov I'm afraid I am not familiar enough with the rustc internals for attribute/macro expansion to properly understand, let alone answer, your questions...

I don't think unsafe expands do anything? It's simply decorating the attribute inside it, e.g. unsafe(no_mangle). no_mangle doesn't expand to anything either, does it? I am very confused by this question.

I also don't understand the issues caused by using unsafe in attribute syntax -- that's a whole separate language anyway, isn't it? Why is the keyword a problem? And why do paths come up, no_mangle or link_name aren't paths to any item either, they are just attributes with hard-coded names, aren't they?

Is unsafe supported inside cfg_attr - #[cfg_attr(TRUE, unsafe(foo))]?

Yes, as the RFC says: "we now also accept unsafe(attr) anywhere that attr can be used"

RalfJung avatar Mar 01 '23 14:03 RalfJung

During T-lang triage, I posed the question of whether we should not push to land this change across all Rust editions, and instead have it only become a change on e.g. Rust 2024.

There's tons of different ways this could be staged. That is listed as "Unresolved question". If you want me to change the RFC to suggest a different staging, just please let me know what exactly it should say -- I basically don't care.^^

Currently the RFC just says to add the lint and make it allow-by-default, and leave everything else for later.

RalfJung avatar Mar 01 '23 14:03 RalfJung

@RalfJung

that's a whole separate language anyway, isn't it? Why is the keyword a problem?

Not at all, the syntax is #[PATH(TOKENS)]/#[PATH[TOKENS]]/#[PATH{TOKENS}]/#[PATH], and only the TOKENS part is the whole separate language. (Attribute call syntax mirrors function-like macro calls almost entirely.) So there's indeed no place for a keyword like unsafe in the current grammar.

And why do paths come up, no_mangle or link_name aren't paths to any item either, they are just attributes with hard-coded names, aren't they?

Both are paths that resolve to built-in attributes, similarly to how i32 is a path that resolves to a built-in type.

petrochenkov avatar Mar 01 '23 18:03 petrochenkov

I don't think unsafe expands do anything? It's simply decorating the attribute inside it, e.g. unsafe(no_mangle). no_mangle doesn't expand to anything either, does it? I am very confused by this question.

All attributes kind of expand, some just do it trivially. #[unsafe(no_mangle)] 1) looks like an attribute call and 2) the result behaves like #[no_mangle]. So it's reasonable to assume that #[unsafe(no_mangle)] expands to #[no_mangle].

If it's a "decorator" for an attribute call, then maybe it shouldn't mimic an attribute call itself? I think something as simple as #[unsafe no_mangle] (*) would work without creating any ambiguities or complications with syntax or expansion.

(*) With unsafe? PATH DELIMITED_TOKEN_STREAM? being the grammar for the Attr thing from the reference.

petrochenkov avatar Mar 01 '23 18:03 petrochenkov

If you want me to change the RFC to suggest a different staging, just please let me know what exactly it should say -- I basically don't care.^

I think what @pnkfelix said is a good outline to me:

maybe it is always warn-by-default for Rust editions < 2024, but is a hard error to leave out the unsafe-discharge for edition >= 2024

Also, from the above comments it sounds like we might need to stare into the grammar rules together, so I'll add

@rustbot label I-lang-nominated

tmandry avatar Mar 02 '23 06:03 tmandry

Error: This repository is not enabled to use triagebot. Add a triagebot.toml in the root of the default branch to enable it.

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

rustbot avatar Mar 02 '23 06:03 rustbot

If it's a "decorator" for an attribute call, then maybe it shouldn't mimic an attribute call itself?

This is the first time I hear of "attribute calls". They are not mentioned in the reference either. So I am lost here. I was thinking adding that surrounding unsafe(...) would just add a new node at the top of the tree that an attribute parses to, but clearly my mental model for attributes is way off. So while unsafe(...) seems like the most natural syntax to me, I am not at all attached to it and happy to change it to anything else.

RalfJung avatar Mar 02 '23 10:03 RalfJung

maybe it is always warn-by-default for Rust editions < 2024, but is a hard error to leave out the unsafe-discharge for edition >= 2024

FWIW earlier versions of this RFC suggested it to be warn-by-default, but people were worried that that would cause too much noise everywhere and so I changed it to start with allow-by-default.

RalfJung avatar Mar 02 '23 10:03 RalfJung