rfcs
rfcs copied to clipboard
[RFC2603] Extend `<const>` to include `str` and structural constants.
I'm not sure what the process for amending RFCs is, but here goes nothing:
This is one of the last pieces of the v0
mangling, namely arbitrary constant values (for the full const
generics feature). This has been tracked by https://github.com/rust-lang/rust/issues/61486, and there's some discussion there.
The main takeaway is that the mangling is structural (ADT-like tree with integer-like leaves), matching the structural equality that type-level constants are required to follow, for soundness reasons.
Accompanying implementation PRs:
- mangling: https://github.com/rust-lang/rust/pull/87194
- demangling: https://github.com/alexcrichton/rustc-demangle/pull/55
The summary of the added forms is:
-
e
:str
, followed by bytes encoded as two hex nibbles per byte -
R
/Q
:&
/&mut
, followed by the pointee value -
A...E
:[...]
, containing any number of element values -
T...E
:(...)
, containing any number of field values -
V
: named variant/struct
, followed by the constructor path and one of:-
U
: unit variant/struct
(e.g.None
) -
T...E
: tuple variant/struct
(e.g.Some(...)
), containing any number of field values -
S...E
: struct-like variant/struct
(e.g.Foo { ... }
), containing any number of (disambiguated-)identifier-prefixed (i.e. named) field values
-
Even if there may be constants in the future not covered by these forms, we can rely on the nominal V
form to encode all sorts of pseudo-paths (while waiting for demanglers to support dedicated manglings), such as these hypothetical examples:
-
const::<SomeType>::h54723863eb99e89f
(hashed constant, masquerading as unitstruct
) -
core::mem::transmute::<usize, *mut T>(1)
(function call, masquerading as tuplestruct
)
cc @michaelwoerister
I'd suggest mentioning in the original RFC's text that it was amended in #3161
The "changelog" section doesn't contain links (yet) and this isn't the first amendment either - should I go back and link in all of the PRs to this file in there?
The "changelog" section doesn't contain links (yet) and this isn't the first amendment either - should I go back and link in all of the PRs to this file in there?
That sounds useful! Maybe while you're at it, change "Change LOG" to "Changelog"
Based on the RFC having been a T-compiler
one, let's do an FCP like that:
@rfcbot merge
Team member @eddyb has proposed to merge this. The next step is review by the rest of the tagged team members:
- [x] @Aaron1011
- [x] @davidtwco
- [x] @eddyb
- [x] @estebank
- [x] @lcnr
- [x] @matthewjasper
- [x] @michaelwoerister
- [x] @nagisa
- [x] @nikomatsakis
- [x] @oli-obk
- [x] @petrochenkov
- [ ] @pnkfelix
- [x] @wesleywiser
Concerns:
- ~~field names are unnecessary~~ resolved by https://github.com/rust-lang/rfcs/pull/3161#issuecomment-1712847885
- ~~unit structs are unnecessary~~ resolved by https://github.com/rust-lang/rfcs/pull/3161#issuecomment-1712847885
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
@rfcbot concern field names are unnecessary
(I guess it's a trade off between symbol length and readability of demangled symbols.)
@rfcbot concern unit structs are unnecessary
Nominating since we're close to being able to turn v0
on by default, and if there's spare time in the meeting, it might be good to get more discussion going and more eyes on this.
After last weeks discussion I think the main open question is what to do with field names in struct values.
As @petrochenkov points out, from the perspective of keeping symbol names unambiguous the field names are not necessary. They would only be needed for making demangler output a bit more readable. Without them you would get
mangled: _RINvCs123_5krate3fooKVNvCs123_5krate3BarSi1Re12345678_b1_E
demangled: krate::foo<{krate::Bar { 1, "quux", true }}>
instead of
mangled: _RINvCs123_5krate3fooKVNvCs123_5krate3BarS3leni14nameRe12345678_4flagb1_E
krate::foo<{krate::Bar { len: 1, name: "quux", flag: true }}>
As the grammar is now, we unfortunately can't just make the field name optional: both <identifier>
and <const>
can start with an s
(if the identifier has a disambiguator and the const is an i16
).
That being said, I don't think we can make a decision without having numbers on how much field names cost us in terms of symbol name length/compile times/binary sizes. So my suggestion is:
- We merge the compiler and rustc-demangle PRs in their current form, that is, with field names.
- We make the new mangling scheme the default in the compiler.
- We implement an experimental option for the compiler to emit zero-length field names, so that we can compare the impact field names have on symbol name length.
- We collect numbers that allow us to make a better-informed decision. At that point we should also have a clearer picture on the performance impact of the new mangling scheme in general.
- Depending on the decision we make, we update this PR to either require field names or not, and merge it.
All of the above would need to happen before const generics are fully stabilized. I don't think the new mangling scheme should be blocked on this one aspect that only comes to bear when using an unstable feature.
We implement an experimental option for the compiler to emit zero-length field names, so that we can compare the impact field names have on symbol name length.
FWIW there are two ways one could do that, while staying within the already-implemented syntax:
- use the
T...E
form of<const-fields>
- demangles as
Foo(...)
, which may be undesirable because it doesn't match the definition ofFoo
- demangles as
- use the
S...E
form of<const-fields>
, but with zero-length identifiers (mangled just a0
)- demangles as
Foo {...}
(and we can even make the demangler print/*...*/:
in that case, if we want) - unlike the
T...E
form, however, it would take up one extra byte per field
- demangles as
If neither of these is satisfactory, we can always come up with a different letter (than U
/T
/S
) to indicate a <const-fields>
that has braces-but-unnamed fields, or even a letter other than V
, for <const>
, to avoid the one extra byte that is currently spent on the choice of <const-fields>
.
Error: This repository is not enabled to use triagebot.
Add a triagebot.toml
in the root of the master branch to enable it.
Please let @rust-lang/release
know if you're having trouble with this bot.
I assume that there are no plans to support unions (since it's not clear what it would mean to encode one)?
I assume that there are no plans to support unions (since it's not clear what it would mean to encode one)?
Correct, though they are disallowed by const
generics themselves.
In general, you would need a way to know whether X<{a: T}>
and X<{b: T}>
are the same type, to allow const _: T
generics, i.e. a version of a == b
that has all the necessary properties for typesystem soundness (and so floats aren't supported for this reason either).
For edge cases that may eventually be supported (such as single-case union
s), they can simply be mangled identically to a struct
.
Feedback from @Havvy elsewhere (https://github.com/rust-lang/rfcs/pull/3224#issuecomment-1053323145): we should probably not be amending RFCs.
Even if we make an exception for "fixing the RFC to reflect reality as it shipped" (https://github.com/rust-lang/rfcs/pull/3224#issuecomment-1056554594), we would still want to split this PR into two:
about half of it could be applied to the RFC itself in order to clarify the original grammar in more detail (we were too handwavey initially), with the rest being a proper extension
The first half would be roughly this (plus comments and <int-type>
/<const-int>
definitions):
-<const> = <type> <const-data>
+<const> = <int-type> <const-int>
+ | "b" <const-int> // false, true
+ | "c" <const-int> // '...'
Whereas the second half would be adding the new structural cases to <const>
(and also "e" <const-str>
).
Hello checking this RFC for current status: are the raised concerns (comment) still to be untangled, has there been any progress? Is there a path to get this RFC approved or do I read correctly that the general feeling is close it?
thanks @eddyb for an update (and sorry for my hand-waving comment, can't fully grasp the context here :sweat_smile: )
@apiraino This amendment is describing functionality that has been implemented for almost two years (but I believe is still nightly-only, simply because const
generics with such types are still not stable).
I don't know if this should be the responsibility of people involved with mangling (@wesleywiser @michaelwoerister @ehuss, off the top of my head), or with const generics (@oli-obk @BoxyUwU @lcnr, maybe?).
Looking again at the comments above, https://github.com/rust-lang/rfcs/pull/3161#issuecomment-901879322 seems to be describing an overall plan, and of its 5 steps, step 1. happened already, and step 2. is/will be happening soon (AIUI, anyway, I haven't kept up that well).
do I read correctly that the general feeling is close it?
At most the specifics of the current form, maybe - something or other will have to be stabilized eventually, to support const
generics with such types.
I think @michaelwoerister's plan in https://github.com/rust-lang/rfcs/pull/3161#issuecomment-901879322 makes a lot of sense. Reducing the length of symbols by not encoding the field names as @petrochenkov suggests is very valuable especially for space constrained environments. I think the best way to determine the impact is to implement the feature with field names and then perform measurements to see what the cost actually ends up being. We may even want to allow users to decide for themselves; some users might prefer the smaller symbols and others prefer the more detailed symbols in backtraces.
@petrochenkov would it resolve your concerns if we left whether to encode field names in the symbols as an open question to be resolved before we stabilize more complex types in generics?
@rfcbot resolve field names are unnecessary @rfcbot resolve unit structs are unnecessary
:bell: This is now entering its final comment period, as per the review above. :bell:
The final comment period, with a disposition to merge, as per the review above, is now complete.
As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.
This will be merged soon.
Is this not merged because of the merge conflict?
Sorry, this kind of fell of my plate.
Process-wise, instead of amending RFCs, this should probably be done as a PR to update the documentation at https://github.com/rust-lang/rust/blob/master/src/doc/rustc/src/symbol-mangling/v0.md.
However, I think part of my hesitation was that I wanted to avoid documenting unstable features. IIUC, some of these consts aren't stabilized, right? I'm a little uncertain how to handle that. Is the intent that these encodings are stable before the features themselves are stabilized? If so, we can just go ahead and document them. If not, then we might want to consider a different approach of how to document them. I would suggest that the encodings just get added to the docs whenever the corresponding const features are stabilized.
I guess, part of my uncertainty was what was the intent of this RFC?
From a pure grammar point of view, I don't see a problem making it expressive enough to enable encoding things that aren't stable yet, and then also document that. The only thing we need to make sure is that what we add to the grammar will actually be what we need for the stable version of complex const generics. But the set being proposed here looks pretty straightforward.
@ehuss @michaelwoerister is the status of this RFC still uncertain? What could help to push this RFC to the finish line? I'm trying to understand if there are blockers or other things it depend on. Thanks
ping @ehuss @michaelwoerister about status for this RFC. Thanks :-)
This depends on two things, as far as I can tell:
- Do we think that the non-trivial const generic values (as listed in the original message) are stable enough to stabilize their encoding in the mangling scheme? This is something that @rust-lang/project-const-generics could answer best.
- Decide whether we want to emit field names or not. In a comment above I suggested comparing symbol name lengths and binary sizes with and without field names, but the problem with that idea is that we don't have a good test case because there are no real-world codebases that use
#![feature(adt_const_params)]
to a significant extent. I propose to just add another alternative to<const-fields>
that is a struct without field names (as suggested by @eddyb as one option here):
That comes at no extra cost to symbol name length and at basically no cost to mangling/demangling complexity. The compiler can then decide to use one form or the other, and we can simply change that decision after<const-fields> = "U" // X | "T" {<const>} "E" // X(a, b, c, ...) | "S" {<const>} "E" // X { value, ... } | "N" {<identifier> <const>} "E" // X { field: value, ... }
adt_const_params
has been stabilized and we have real-world data about the impact.
The main point of instability with adt_const_params
is the handling of padding and addresses, all of which is caused by allowing &
in const parameters' types. For everything other than references I think we have pretty much hit the point where nothing is likely to change and we'd have no problem with this encoding.
When references are involved there's data that we currently don't encode that we might want to start doing at some point, these are solely issues with allowing references in const generics and do not occur otherwise. In general see rust-lang/rust#120961 for an overview of what we're not currently representing in values for const generics.
Some of the stuff we might want to start representing in const generic values would work perfectly fine in this scheme as is (e.g. padding bytes and what static a borrow is a reference to). Other stuff I have no idea how we would even handle mangling (e.g. bytes with provenance and more generally representing borrows as having their pointee be arbitrary bytes).
The stuff that wouldn't work under this scheme I don't know how we could make it work under any scheme so I think that largely does not pose an issue for this RFC. My only real point would be that if we wind up forbidding references in const parameter types, the R
/Q
syntax would wind up effectively unused. I don't know if that's okay or not, if it isn't we may want to remove it from the RFC ^^
edit: sorry for any ghost pings, I originally had this message contain some other information but removed it
Thanks for the info, @BoxyUwU!
- I think it's fine to erase information from the symbol-mangling representation of a const value as long as that can't lead to ambiguities. E.g. we don't need to be able to reconstruct the padding of a value, as long as there can't be two instances
foo<X>
andfoo<Y>
that only differ in padding. - provenance information might be a similar case: we probably don't need to represent that in the symbol name.
- in general, there is always the option to add an opaque disambiguator to distinguish between things that have no difference at the surface level. That could take care of padding.
- We could have an "opaque value" alternative for
<const-fields>
that is just a sufficiently wide hash of the value. - statics in
const_refs_to_static
could be represented by their path. We'd have to see how that fits into the grammar. If this is likely to be stabilized then it would be good to handle the case now.
Overall though, the fewer opaque hash values the better.