rust
rust copied to clipboard
Add encoding for `f16` and `f128`
This establishes new character encodings for the new primitive types tracked at https://github.com/rust-lang/rust/issues/116909:
-
k
forf16
-
q
forf128
We cannot easily be consistent with Itanium because it uses a multi-character encoding for _Float16
(Dh
), which I believe would conflict with dyn
anyway. We could use Itanium's g
for f128
, but q
is more consistent with f
=float
=f32
, d
=double
=f64
, q
=quad
=f128
(unfortunately h
for half
/f16
is already taken).
Per @michaelwoerister this will need a compiler FCP
See also previous discussion at https://github.com/rust-lang/rustc-demangle/pull/64. Currently, the compiler will just ICE if attempting to v0 mangle these types.
@rustbot label +T-compiler +needs-fcp
Cc @rcvalle @ehuss
r? @JohnTitor
rustbot has assigned @JohnTitor. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r? to explicitly pick a reviewer
r? @michaelwoerister
This PR reserves space in the v0 symbol mangling grammar for the new f16
and f128
primitive types. The change to the grammar is backwards compatible and straightforward. Since the change is public-facing (i.e. it needs to be picked up by downstream tooling), let's do an FCP.
@rfcbot merge
Thanks for opening the PR, @tgross35!
Team member @michaelwoerister has proposed to merge this. The next step is review by the rest of the tagged team members:
- [x] @Aaron1011
- [x] @cjgillot
- [x] @compiler-errors
- [x] @davidtwco
- [ ] @eddyb
- [x] @estebank
- [ ] @jackh726
- [x] @lcnr
- [x] @matthewjasper
- [x] @michaelwoerister
- [x] @nagisa
- [x] @oli-obk
- [x] @petrochenkov
- [ ] @pnkfelix
- [ ] @wesleywiser
Concerns:
- Compatibility with existing tools (https://github.com/rust-lang/rust/pull/122106#issuecomment-2037362280)
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
What happens when attempting to demangle symbols that use the new encodings with an older version of the demangling code? Since we have some level of support for v0 mangling in the wild, we should consider how making this change will affect existing tooling.
What happens when attempting to demangle symbols that use the new encodings with an older version of the demangling code? Since we have some level of support for v0 mangling in the wild, we should consider how making this change will affect existing tooling.
That is a good question, the docs don't seem to call out any specific forward compatibility or error handling. It looks like rustc-demangle
just gives up and errors in these cases, which likely is the same for most implementations.
I think we can probably live with the fact that existing demanglers will just error until they can update support. To me this does not feel like a breaking change because it does not change anything that currently demangles correctly.
It is probably still reasonable to try to make sure at least known demanglers support these symbols before stabilizing the primitives, I added it to the todo list https://github.com/rust-lang/rust/issues/116909#issuecomment-1969554030.
We could also update documentation to reserve a set of characters for future primitives that demanglers should gracefully handle (??
) to help in case we add anything like bf16
/decimal32
/i256/
bitintN`, but I don't think this is worth it.
Q: does the k
cover all different formats (e.g. bfloat16) of 16-bit float, or just a specific one?
Q: does the
k
cover all different formats (e.g. bfloat16) of 16-bit float, or just a specific one?
I think ideally it would be different, leaving k
only for binary16 f16
. Brain float would need to be monomorphized separately so it would probably make sense as a different symbol. Itanium uses DF16b
.
Yes, it's likely that existing demanglers will fail if they encounter something unknown in the grammar. If we had to start from scratch with a new mangling scheme, then forward compatibility would be on my set of requirements. As things are now, I think the best approach is to document new additions early (before they are emitted by a stable compiler) and try to get support merged in external tooling during that still-unstable window.
I somewhat regret reproducing the Itanium primitive encoding, it doesn't scale well, so its only benefit is compactly mentioning a lot of primitives (more relevant to C++ overloads than to Rust).
I have nothing against the reservation but I should point out there are alternatives.
If today rustc
always generates non-zero crate disambiguators, then C3f16
would demangle correctly, and never conflict with a crate named f16
.
While less pretty, we could also reserve special "namespaces" (using crate name+disambiguator values impossible to arise for real crates), or even give new primitives full identities in libcore using lang items.
I somewhat regret reproducing the Itanium primitive encoding, it doesn't scale well, so its only benefit is compactly mentioning a lot of primitives (more relevant to C++ overloads than to Rust).
I have nothing against the reservation but I should point out there are alternatives.
If today
rustc
always generates non-zero crate disambiguators, thenC3f16
would demangle correctly, and never conflict with a crate namedf16
.While less pretty, we could also reserve special "namespaces" (using crate name+disambiguator values impossible to arise for real crates), or even give new primitives full identities in libcore using lang items.
Is your position neutral here, or would you prefer investigating the above at this point?
I do think there is some nice consistency in saying that all u
, i
, and f
primitives <= 128 bits have similar representations (as proposed). Any larger types (i256
, u256
, f256
) or anything not following this pattern (bf16
) would then be a reasonable place to switch to a more generic representation.
@wesleywiser were the above answers reasonable enough, or did you have any further concerns? For what it is worth, one demangler has already picked up the additions https://github.com/rust-lang/rust/issues/116909#issuecomment-2029280724.
It seems like giving these primitives full identities in libcore is pretty cheap (we'll want methods on them anyway) and presumably means that there's no 3-5 year period while tooling starts to support demangling symbols with them present?
That feels like a much better option to me, the only thing we're gaining by doing the current reservation approach seems to be a bit of efficiency in symbol length, right? That seems like a pretty minor win (symbols are already pretty long).
I'm still constantly on systems that don't demangle baseline v0 out of the box in perf or other tooling, so this would essentially reset the clock (albeit only for code using these types...).
If the expectation is that there'll be more additions like this (e.g. the bf16
type mentioned) then I would also prefer emitting regular paths (e.g. to core::primitives aliases). That would allow currently demanglers to decode symbols containing the new types. If we decide to do that here, that should also be the policy for all similar cases going forward.
I'd like to echo the above points as a soft concern (I'm not convinced to check my box, but wouldn't block if it would otherwise land).
I think if we foresee future changes like this, we should consider that doing this dance every time (FCPing, waiting for downstream support, thinking about backwards-compatibility, etc.) is not ideal and that there may be better approaches (like emitting regular paths mentioned above).
Given the above conversation I guess I'm going to file a concern on my own FCP 🙂
@rfcbot concern Compatibility with existing tools
The extremely long latency before downstream tooling like perf
or gdb
can pick up changes to the grammar has been by far the main obstacle to using the v0 scheme by default. The change to the grammar here is minimal but both @eddyb's C3f16
proposal and using a full path to something in libcore are good alternatives -- especially when considering that these types won't be very common in symbol names.
If noboby objects, I'm going to cancel the FCP and suggest that we just use C3f16
for now (which does not require a grammar change or an FCP)
Here's a proposal for solving these kinds of problems in a more structured way: https://github.com/rust-lang/compiler-team/issues/737
I'm seeing no objections to cancelling the FCP, so: @rfcbot fcp cancel
I think MCP 737 provides a good, general solution to evolving the mangling scheme without too much breakage.
@michaelwoerister proposal cancelled.
Thanks everyone for the feedback, I have opened https://github.com/rust-lang/rust/pull/123816 to supercede this.