rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

[RFC2603] Extend `<const>` to include `str` and structural constants.

Open eddyb opened this issue 2 years ago • 28 comments

I'm not sure what the process for amending RFCs is, but here goes nothing:

This is one of the last pieces of the v0 mangling, namely arbitrary constant values (for the full const generics feature). This has been tracked by https://github.com/rust-lang/rust/issues/61486, and there's some discussion there.

The main takeaway is that the mangling is structural (ADT-like tree with integer-like leaves), matching the structural equality that type-level constants are required to follow, for soundness reasons.

Accompanying implementation PRs:

  • mangling: https://github.com/rust-lang/rust/pull/87194
  • demangling: https://github.com/alexcrichton/rustc-demangle/pull/55

The summary of the added forms is:

  • e: str, followed by bytes encoded as two hex nibbles per byte
  • R/Q: &/&mut, followed by the pointee value
  • A...E: [...], containing any number of element values
  • T...E: (...), containing any number of field values
  • V: named variant/struct, followed by the constructor path and one of:
    • U: unit variant/struct (e.g. None)
    • T...E: tuple variant/struct (e.g. Some(...)), containing any number of field values
    • S...E: struct-like variant/struct (e.g. Foo { ... }), containing any number of (disambiguated-)identifier-prefixed (i.e. named) field values

Even if there may be constants in the future not covered by these forms, we can rely on the nominal V form to encode all sorts of pseudo-paths (while waiting for demanglers to support dedicated manglings), such as these hypothetical examples:

  • const::<SomeType>::h54723863eb99e89f (hashed constant, masquerading as unit struct)
  • core::mem::transmute::<usize, *mut T>(1) (function call, masquerading as tuple struct)

cc @michaelwoerister

eddyb avatar Aug 10 '21 18:08 eddyb

I'd suggest mentioning in the original RFC's text that it was amended in #3161

programmerjake avatar Aug 10 '21 20:08 programmerjake

The "changelog" section doesn't contain links (yet) and this isn't the first amendment either - should I go back and link in all of the PRs to this file in there?

eddyb avatar Aug 10 '21 20:08 eddyb

The "changelog" section doesn't contain links (yet) and this isn't the first amendment either - should I go back and link in all of the PRs to this file in there?

That sounds useful! Maybe while you're at it, change "Change LOG" to "Changelog"

programmerjake avatar Aug 10 '21 21:08 programmerjake

Based on the RFC having been a T-compiler one, let's do an FCP like that:

@rfcbot merge

eddyb avatar Aug 11 '21 07:08 eddyb

Team member @eddyb has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @Aaron1011
  • [x] @davidtwco
  • [x] @eddyb
  • [x] @estebank
  • [x] @lcnr
  • [x] @matthewjasper
  • [x] @michaelwoerister
  • [x] @nagisa
  • [x] @nikomatsakis
  • [x] @oli-obk
  • [x] @petrochenkov
  • [ ] @pnkfelix
  • [x] @wesleywiser

Concerns:

  • ~~field names are unnecessary~~ resolved by https://github.com/rust-lang/rfcs/pull/3161#issuecomment-1712847885
  • ~~unit structs are unnecessary~~ resolved by https://github.com/rust-lang/rfcs/pull/3161#issuecomment-1712847885

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot avatar Aug 11 '21 07:08 rfcbot

@rfcbot concern field names are unnecessary

(I guess it's a trade off between symbol length and readability of demangled symbols.)

@rfcbot concern unit structs are unnecessary

petrochenkov avatar Aug 11 '21 16:08 petrochenkov

Nominating since we're close to being able to turn v0 on by default, and if there's spare time in the meeting, it might be good to get more discussion going and more eyes on this.

eddyb avatar Aug 11 '21 16:08 eddyb

After last weeks discussion I think the main open question is what to do with field names in struct values.

As @petrochenkov points out, from the perspective of keeping symbol names unambiguous the field names are not necessary. They would only be needed for making demangler output a bit more readable. Without them you would get

mangled: _RINvCs123_5krate3fooKVNvCs123_5krate3BarSi1Re12345678_b1_E
demangled: krate::foo<{krate::Bar { 1, "quux", true }}>

instead of

mangled: _RINvCs123_5krate3fooKVNvCs123_5krate3BarS3leni14nameRe12345678_4flagb1_E
krate::foo<{krate::Bar { len: 1, name: "quux", flag: true }}>

As the grammar is now, we unfortunately can't just make the field name optional: both <identifier> and <const> can start with an s (if the identifier has a disambiguator and the const is an i16).

That being said, I don't think we can make a decision without having numbers on how much field names cost us in terms of symbol name length/compile times/binary sizes. So my suggestion is:

  1. We merge the compiler and rustc-demangle PRs in their current form, that is, with field names.
  2. We make the new mangling scheme the default in the compiler.
  3. We implement an experimental option for the compiler to emit zero-length field names, so that we can compare the impact field names have on symbol name length.
  4. We collect numbers that allow us to make a better-informed decision. At that point we should also have a clearer picture on the performance impact of the new mangling scheme in general.
  5. Depending on the decision we make, we update this PR to either require field names or not, and merge it.

All of the above would need to happen before const generics are fully stabilized. I don't think the new mangling scheme should be blocked on this one aspect that only comes to bear when using an unstable feature.

michaelwoerister avatar Aug 19 '21 12:08 michaelwoerister

We implement an experimental option for the compiler to emit zero-length field names, so that we can compare the impact field names have on symbol name length.

FWIW there are two ways one could do that, while staying within the already-implemented syntax:

  1. use the T...E form of <const-fields>
    • demangles as Foo(...), which may be undesirable because it doesn't match the definition of Foo
  2. use the S...E form of <const-fields>, but with zero-length identifiers (mangled just a 0)
    • demangles as Foo {...} (and we can even make the demangler print /*...*/: in that case, if we want)
    • unlike the T...E form, however, it would take up one extra byte per field

If neither of these is satisfactory, we can always come up with a different letter (than U/T/S) to indicate a <const-fields> that has braces-but-unnamed fields, or even a letter other than V, for <const>, to avoid the one extra byte that is currently spent on the choice of <const-fields>.

eddyb avatar Aug 19 '21 12:08 eddyb

Discussed in T-compiler meetings, here and here

@rustbot label -I-nominated

apiraino avatar Aug 19 '21 14:08 apiraino

Error: This repository is not enabled to use triagebot. Add a triagebot.toml in the root of the master branch to enable it.

Please let @rust-lang/release know if you're having trouble with this bot.

rustbot avatar Aug 19 '21 14:08 rustbot

I assume that there are no plans to support unions (since it's not clear what it would mean to encode one)?

Aaron1011 avatar Aug 23 '21 17:08 Aaron1011

I assume that there are no plans to support unions (since it's not clear what it would mean to encode one)?

Correct, though they are disallowed by const generics themselves.

In general, you would need a way to know whether X<{a: T}> and X<{b: T}> are the same type, to allow const _: T generics, i.e. a version of a == b that has all the necessary properties for typesystem soundness (and so floats aren't supported for this reason either).

For edge cases that may eventually be supported (such as single-case unions), they can simply be mangled identically to a struct.

eddyb avatar Aug 24 '21 08:08 eddyb

Feedback from @Havvy elsewhere (https://github.com/rust-lang/rfcs/pull/3224#issuecomment-1053323145): we should probably not be amending RFCs.

Even if we make an exception for "fixing the RFC to reflect reality as it shipped" (https://github.com/rust-lang/rfcs/pull/3224#issuecomment-1056554594), we would still want to split this PR into two:

about half of it could be applied to the RFC itself in order to clarify the original grammar in more detail (we were too handwavey initially), with the rest being a proper extension

The first half would be roughly this (plus comments and <int-type>/<const-int> definitions):

-<const> = <type> <const-data>
+<const> = <int-type> <const-int>
+        | "b" <const-int>           // false, true
+        | "c" <const-int>           // '...'

Whereas the second half would be adding the new structural cases to <const> (and also "e" <const-str>).

eddyb avatar Mar 02 '22 08:03 eddyb

Hello checking this RFC for current status: are the raised concerns (comment) still to be untangled, has there been any progress? Is there a path to get this RFC approved or do I read correctly that the general feeling is close it?

thanks @eddyb for an update (and sorry for my hand-waving comment, can't fully grasp the context here :sweat_smile: )

apiraino avatar May 03 '23 09:05 apiraino

@apiraino This amendment is describing functionality that has been implemented for almost two years (but I believe is still nightly-only, simply because const generics with such types are still not stable).

I don't know if this should be the responsibility of people involved with mangling (@wesleywiser @michaelwoerister @ehuss, off the top of my head), or with const generics (@oli-obk @BoxyUwU @lcnr, maybe?).


Looking again at the comments above, https://github.com/rust-lang/rfcs/pull/3161#issuecomment-901879322 seems to be describing an overall plan, and of its 5 steps, step 1. happened already, and step 2. is/will be happening soon (AIUI, anyway, I haven't kept up that well).


do I read correctly that the general feeling is close it?

At most the specifics of the current form, maybe - something or other will have to be stabilized eventually, to support const generics with such types.

eddyb avatar May 10 '23 16:05 eddyb

I think @michaelwoerister's plan in https://github.com/rust-lang/rfcs/pull/3161#issuecomment-901879322 makes a lot of sense. Reducing the length of symbols by not encoding the field names as @petrochenkov suggests is very valuable especially for space constrained environments. I think the best way to determine the impact is to implement the feature with field names and then perform measurements to see what the cost actually ends up being. We may even want to allow users to decide for themselves; some users might prefer the smaller symbols and others prefer the more detailed symbols in backtraces.

@petrochenkov would it resolve your concerns if we left whether to encode field names in the symbols as an open question to be resolved before we stabilize more complex types in generics?

wesleywiser avatar Sep 08 '23 14:09 wesleywiser

@rfcbot resolve field names are unnecessary @rfcbot resolve unit structs are unnecessary

petrochenkov avatar Sep 10 '23 15:09 petrochenkov

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot avatar Sep 10 '23 15:09 rfcbot

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

rfcbot avatar Sep 20 '23 15:09 rfcbot

Is this not merged because of the merge conflict?

kennytm avatar Feb 06 '24 17:02 kennytm

Sorry, this kind of fell of my plate.

Process-wise, instead of amending RFCs, this should probably be done as a PR to update the documentation at https://github.com/rust-lang/rust/blob/master/src/doc/rustc/src/symbol-mangling/v0.md.

However, I think part of my hesitation was that I wanted to avoid documenting unstable features. IIUC, some of these consts aren't stabilized, right? I'm a little uncertain how to handle that. Is the intent that these encodings are stable before the features themselves are stabilized? If so, we can just go ahead and document them. If not, then we might want to consider a different approach of how to document them. I would suggest that the encodings just get added to the docs whenever the corresponding const features are stabilized.

I guess, part of my uncertainty was what was the intent of this RFC?

ehuss avatar Feb 06 '24 18:02 ehuss

From a pure grammar point of view, I don't see a problem making it expressive enough to enable encoding things that aren't stable yet, and then also document that. The only thing we need to make sure is that what we add to the grammar will actually be what we need for the stable version of complex const generics. But the set being proposed here looks pretty straightforward.

michaelwoerister avatar Feb 07 '24 10:02 michaelwoerister

@ehuss @michaelwoerister is the status of this RFC still uncertain? What could help to push this RFC to the finish line? I'm trying to understand if there are blockers or other things it depend on. Thanks

apiraino avatar May 13 '24 09:05 apiraino