rust icon indicating copy to clipboard operation
rust copied to clipboard

Parsing issues around qualifiers and modifiers

Open fmease opened this issue 3 months ago • 9 comments

Fn vs. trait qualifiers

We fail to parse

const unsafe trait Trait {} //~ ERROR expected one of `extern` or `fn`, found keyword `trait`

and

const unsafe auto trait Trait {} //~ ERROR expected one of `extern` or `fn`, found `auto`

even though check_trait_front_matter would return true here in theory. That's because in parse_item_kind we check_fn_front_matter first which returns true once there are >=2 "fn" qualifiers (†) which most notably includes the sequence const unsafe. Two basic solutions:

  • (†): check_fn_front_matter already contains certain exceptions like unsafe extern {, async gen { and async gen move {; we could extend this with !self.is_trait_front_matter() but that's kind of awful because we would basically check the sequence const unsafe three times over. Moreover, we gonna wanna have both is_* (&self) and check_* (&mut self) variants but that means bad duplication.
  • We could move the check_trait_front_matter check above/before the fn one since it's more strict. However, I wonder if this breaks some soft invariants: check_trait_front_matter is also used elsewhere to determine whether a fn item follows. With this change, it would have "false positives". Likely not a problem in practice though.

Fn ptr ty qualifiers vs. trait bound modifiers in bare trait object tys

What's more, we also use check_fn_front_matter inside parse_ty_common for fn ptr tys but later reject const and async qualifiers in parse_fn_front_matter. Now, since that happens before we check can_begin_bound we incorrectly reject bare trait object tys that start with >=2 trait bound modifiers if they coincide with "fn" qualifiers (thanks to check_fn_front_matter yet again), i.e., async const bare trait object tys:

#![feature(const_trait_impl, async_trait_bounds)]

#[cfg(false)]
type X = (const Trait, async Trait, dyn const async Trait); // OK!

#[cfg(false)]
type Y = const async Trait;
//~^ ERROR an `fn` pointer type cannot be `const`
//~| ERROR an `fn` pointer type cannot be `async`
//~| ERROR expected one of `extern`, `fn`, `gen`, `safe`, or `unsafe`, found `Trait`

Array/slice ty vs. conditional constness inside bare trait object tys

For consistency, the following should get parsed as a bare trait object ty with a conditionally-const trait bound. However, we commit to parsing an array/slice ty before that:

#[cfg(false)] type X = dyn [const] Trait; // OK!

#[cfg(false)] type Y = [const] Trait; //~ ERROR expected identifier, found `]`

fmease avatar Sep 02 '25 14:09 fmease

I can't lie, check_fn_front_matter is beyond scuffed and gnarly with its ">=2 qualifiers rule", just awful. It doesn't compose well and certain diags suffer, too. OTOH, while check_trait_front_matter is way stricter, it's also quite illegible, verbose and otherwise hard to maintain. Qualifier parsing is hell.

fmease avatar Sep 02 '25 14:09 fmease

Technically speaking not specific to const_trait_impl but it's CTI that now suffers the consequences.

fmease avatar Sep 02 '25 14:09 fmease

I think that the right strategy for qualifier parsing is to detect that there's any qualifier and then collect qualifiers in a loop, regardless of order, and then check the order after we get to the actual item. That would let us produce good diagnostics for any combination (including duplicates) and simplify the parsing logic.

estebank avatar Sep 04 '25 17:09 estebank

I think that the right strategy for qualifier parsing is to detect that there's any qualifier and then collect qualifiers in a loop, regardless of order, and then check the order after we get to the actual item.

That would indeed be very nice but it's not that simple. const (but not const {) could start a const item, const fn or const trait (and we want to recover from const impl for impl const). const auto could be the start of a const auto trait or of a const item (e.g., const auto: () = ();), extern could be start of an extern fn, crate or a block, so we must interpret the context-sensitive keyword safe as a modifier without regressing safe as a normal identifier (for item macro calls or if we're in stmt ctxt for arbitrary exprs[^1]: safe!(), safe(), safe, safe + 1, …) and so on.

[^1]: Well, we already fail to parse safe fn f() {} as an item if we're inside of a fn body which is bad; this just goes to show how awful this all is.

fmease avatar Oct 18 '25 12:10 fmease

Tho, I guess the approach is still possible even tho it needs to be a bit more involved. I'm thinking of sth. like

fn parse_item_modifier_candidates(&mut self) -> PResult<Vec<ModifierCandidate>>;

enum ModifierCandidate {
    Auto,
    Const,
    Extern { abi: Option<Symbol> },
    Safe,
    …
}

where [ModifierCandidate::Const] would dictate that we should parse a const item const …: …; (disqualifying const fns and const traits). The question then is how much "processing" should that function do. E.g., should const auto: () = () result in [MC::Const, MC::Auto] (→ const item b/c no MC follows Auto) or in [MC::Const] (→ const item; cursor is still before normal ident auto).

fmease avatar Oct 18 '25 12:10 fmease

This could be very easily expressed using backtracking I believe (contrary to the look aheads we need to do rn) but as you know we're not meant to do that in rustc's parser esp. since snapshotting the parser is super expensive (cloning the entire parser state instead of simply resetting some cursor: usize) and should only be used for error recovery in the bad path.

Edit: Actually, it's not too bad using a bit of look-ahead in parse_item_modifier_candidates.

fmease avatar Oct 18 '25 12:10 fmease

Oh gawd, while the context-sensitive keyword auto is treated as a qualifier where possible, safe is super inflexible but I think we can be a lot more liberal and treat it as a modifier whenever it's followed by fn or extern.

For reference, this is OK (which is good):

#[cfg(false)] safe fn f() {} // OK!
#[cfg(false)] async safe fn f() {} // OK!
#[cfg(false)] const async safe fn f() {} // OK!

However, these aren't (which is bad IMO):

const safe fn f() {}
//~^ ERROR expected one of `:`, `;`, `<`, `=`, or `where`, found keyword `fn`
//~| ERROR missing type for `const` item
fn wrap() {
    safe fn f() {} //~ ERROR expected one of `!`, `.`, `::`, `;`, `?`, `{`, `}`, or an operator, found keyword `fn`
}

fmease avatar Oct 18 '25 17:10 fmease

More issues in the vicinity (I'm assuming Rust 2024 here):

  • (expr) stmt async gen || {}; doesn't parse even tho _ = async gen || {}; and for<> async gen || {}; do
  • expr const async || {} doesn't parse at all despite const || {} and async || {} being a thing (and the ordering mirroring const async fn f() {})
  • expr gen || {} doesn't parse unlike for<> gen || {}; (as an aside: it's unclear if gen || {} should exist on its own IIRC vs. gen {}; errs mentioned that once)
Overview of various closure expressions

Disclaimer: Doesn't include use or static closures.

|| {}; // ✔️
|_| {}; // ✔️
| | {}; // ✔️
move || {}; // ✔️
move |_| {}; // ✔️
gen || {}; // ❌
gen |_| {}; // ❌
gen move || {}; // ❌
gen move |_| {}; // ❌
for<> || {}; // ✔️
for<> |_| {}; // ✔️
for<> move || {}; // ✔️
for<> move |_| {}; // ✔️
for<> gen || {}; // ✔️
for<> gen |_| {}; // ✔️
for<> gen move || {}; // ✔️
for<> gen move |_| {}; // ✔️
for<> const || {}; // ✔️
for<> const |_| {}; // ✔️
for<> const move || {}; // ✔️
for<> const move |_| {}; // ✔️
for<> const gen || {}; // ❌
for<> const gen |_| {}; // ❌
for<> const gen move || {}; // ❌
for<> const gen move |_| {}; // ❌
for<> const async || {}; // ❌
for<> const async |_| {}; // ❌
for<> const async gen || {}; // ❌
for<> const async gen |_| {}; // ❌
for<> const async gen move || {}; // ❌
for<> const async gen move |_| {}; // ❌
for<> async || {}; // ✔️
for<> async |_| {}; // ✔️
for<> async move || {}; // ✔️
for<> async move |_| {}; // ✔️
for<> async gen || {}; // ✔️
for<> async gen |_| {}; // ✔️
for<> async gen move || {}; // ✔️
for<> async gen move |_| {}; // ✔️
const || {}; // ✔️
const |_| {}; // ✔️
const move || {}; // ✔️
const move |_| {}; // ✔️
const gen || {}; // ❌
const gen |_| {}; // ❌
const gen move || {}; // ❌
const gen move |_| {}; // ❌
const async || {}; // ❌
const async |_| {}; // ❌
const async gen || {}; // ❌
const async gen |_| {}; // ❌
const async gen move || {}; // ❌
const async gen move |_| {}; // ❌
async || {}; // ✔️
async |_| {}; // ✔️
async move || {}; // ✔️
async move |_| {}; // ✔️
async gen || {}; // ❌
async gen |_| {}; // ❌
async gen move || {}; // ❌
async gen move |_| {}; // ❌

fmease avatar Nov 03 '25 18:11 fmease

In PR #147864, we added another special case (hack) to the "fn frontmatter" check to support const unsafe trait Trait {} (etc.) from the issue description.

Since PR #148434, we support const impl … {} next to impl const … {} and given the delicate nature & non-obvious details of the parser I've fully outlined above, we fail to parse const unsafe impl … {} (while we do parse unsafe impl const … {}).

fmease avatar Dec 06 '25 22:12 fmease