rust icon indicating copy to clipboard operation
rust copied to clipboard

Add encoding for `f16` and `f128`

Open tgross35 opened this issue 4 months ago • 8 comments

This establishes new character encodings for the new primitive types tracked at https://github.com/rust-lang/rust/issues/116909:

  • k for f16
  • q for f128

We cannot easily be consistent with Itanium because it uses a multi-character encoding for _Float16 (Dh), which I believe would conflict with dyn anyway. We could use Itanium's g for f128, but q is more consistent with f=float=f32, d=double=f64, q=quad=f128 (unfortunately h for half/f16 is already taken).

Per @michaelwoerister this will need a compiler FCP

See also previous discussion at https://github.com/rust-lang/rustc-demangle/pull/64. Currently, the compiler will just ICE if attempting to v0 mangle these types.

@rustbot label +T-compiler +needs-fcp

Cc @rcvalle @ehuss

tgross35 avatar Mar 06 '24 20:03 tgross35

r? @JohnTitor

rustbot has assigned @JohnTitor. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot avatar Mar 06 '24 20:03 rustbot

r? @michaelwoerister

tgross35 avatar Mar 06 '24 20:03 tgross35

This PR reserves space in the v0 symbol mangling grammar for the new f16 and f128 primitive types. The change to the grammar is backwards compatible and straightforward. Since the change is public-facing (i.e. it needs to be picked up by downstream tooling), let's do an FCP.

@rfcbot merge

Thanks for opening the PR, @tgross35!

michaelwoerister avatar Mar 07 '24 09:03 michaelwoerister

Team member @michaelwoerister has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @Aaron1011
  • [x] @cjgillot
  • [x] @compiler-errors
  • [x] @davidtwco
  • [ ] @eddyb
  • [x] @estebank
  • [ ] @jackh726
  • [x] @lcnr
  • [x] @matthewjasper
  • [x] @michaelwoerister
  • [x] @nagisa
  • [x] @oli-obk
  • [x] @petrochenkov
  • [ ] @pnkfelix
  • [ ] @wesleywiser

Concerns:

  • Compatibility with existing tools (https://github.com/rust-lang/rust/pull/122106#issuecomment-2037362280)

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot avatar Mar 07 '24 09:03 rfcbot

What happens when attempting to demangle symbols that use the new encodings with an older version of the demangling code? Since we have some level of support for v0 mangling in the wild, we should consider how making this change will affect existing tooling.

wesleywiser avatar Mar 08 '24 02:03 wesleywiser

What happens when attempting to demangle symbols that use the new encodings with an older version of the demangling code? Since we have some level of support for v0 mangling in the wild, we should consider how making this change will affect existing tooling.

That is a good question, the docs don't seem to call out any specific forward compatibility or error handling. It looks like rustc-demangle just gives up and errors in these cases, which likely is the same for most implementations.

I think we can probably live with the fact that existing demanglers will just error until they can update support. To me this does not feel like a breaking change because it does not change anything that currently demangles correctly.

It is probably still reasonable to try to make sure at least known demanglers support these symbols before stabilizing the primitives, I added it to the todo list https://github.com/rust-lang/rust/issues/116909#issuecomment-1969554030.

We could also update documentation to reserve a set of characters for future primitives that demanglers should gracefully handle (??) to help in case we add anything like bf16/decimal32/i256/bitintN`, but I don't think this is worth it.

tgross35 avatar Mar 08 '24 04:03 tgross35

Q: does the k cover all different formats (e.g. bfloat16) of 16-bit float, or just a specific one?

nagisa avatar Mar 09 '24 12:03 nagisa

Q: does the k cover all different formats (e.g. bfloat16) of 16-bit float, or just a specific one?

I think ideally it would be different, leaving k only for binary16 f16. Brain float would need to be monomorphized separately so it would probably make sense as a different symbol. Itanium uses DF16b.

tgross35 avatar Mar 10 '24 22:03 tgross35

Yes, it's likely that existing demanglers will fail if they encounter something unknown in the grammar. If we had to start from scratch with a new mangling scheme, then forward compatibility would be on my set of requirements. As things are now, I think the best approach is to document new additions early (before they are emitted by a stable compiler) and try to get support merged in external tooling during that still-unstable window.

michaelwoerister avatar Mar 11 '24 08:03 michaelwoerister

I somewhat regret reproducing the Itanium primitive encoding, it doesn't scale well, so its only benefit is compactly mentioning a lot of primitives (more relevant to C++ overloads than to Rust).


I have nothing against the reservation but I should point out there are alternatives.

If today rustc always generates non-zero crate disambiguators, then C3f16 would demangle correctly, and never conflict with a crate named f16.

While less pretty, we could also reserve special "namespaces" (using crate name+disambiguator values impossible to arise for real crates), or even give new primitives full identities in libcore using lang items.

eddyb avatar Mar 29 '24 01:03 eddyb

I somewhat regret reproducing the Itanium primitive encoding, it doesn't scale well, so its only benefit is compactly mentioning a lot of primitives (more relevant to C++ overloads than to Rust).

I have nothing against the reservation but I should point out there are alternatives.

If today rustc always generates non-zero crate disambiguators, then C3f16 would demangle correctly, and never conflict with a crate named f16.

While less pretty, we could also reserve special "namespaces" (using crate name+disambiguator values impossible to arise for real crates), or even give new primitives full identities in libcore using lang items.

Is your position neutral here, or would you prefer investigating the above at this point?

I do think there is some nice consistency in saying that all u, i, and f primitives <= 128 bits have similar representations (as proposed). Any larger types (i256, u256, f256) or anything not following this pattern (bf16) would then be a reasonable place to switch to a more generic representation.


@wesleywiser were the above answers reasonable enough, or did you have any further concerns? For what it is worth, one demangler has already picked up the additions https://github.com/rust-lang/rust/issues/116909#issuecomment-2029280724.

tgross35 avatar Apr 01 '24 08:04 tgross35

It seems like giving these primitives full identities in libcore is pretty cheap (we'll want methods on them anyway) and presumably means that there's no 3-5 year period while tooling starts to support demangling symbols with them present?

That feels like a much better option to me, the only thing we're gaining by doing the current reservation approach seems to be a bit of efficiency in symbol length, right? That seems like a pretty minor win (symbols are already pretty long).

I'm still constantly on systems that don't demangle baseline v0 out of the box in perf or other tooling, so this would essentially reset the clock (albeit only for code using these types...).

Mark-Simulacrum avatar Apr 01 '24 12:04 Mark-Simulacrum

If the expectation is that there'll be more additions like this (e.g. the bf16 type mentioned) then I would also prefer emitting regular paths (e.g. to core::primitives aliases). That would allow currently demanglers to decode symbols containing the new types. If we decide to do that here, that should also be the policy for all similar cases going forward.

michaelwoerister avatar Apr 02 '24 09:04 michaelwoerister

I'd like to echo the above points as a soft concern (I'm not convinced to check my box, but wouldn't block if it would otherwise land).

I think if we foresee future changes like this, we should consider that doing this dance every time (FCPing, waiting for downstream support, thinking about backwards-compatibility, etc.) is not ideal and that there may be better approaches (like emitting regular paths mentioned above).

jackh726 avatar Apr 04 '24 14:04 jackh726

Given the above conversation I guess I'm going to file a concern on my own FCP 🙂

@rfcbot concern Compatibility with existing tools

The extremely long latency before downstream tooling like perf or gdb can pick up changes to the grammar has been by far the main obstacle to using the v0 scheme by default. The change to the grammar here is minimal but both @eddyb's C3f16 proposal and using a full path to something in libcore are good alternatives -- especially when considering that these types won't be very common in symbol names.

If noboby objects, I'm going to cancel the FCP and suggest that we just use C3f16 for now (which does not require a grammar change or an FCP)

michaelwoerister avatar Apr 04 '24 14:04 michaelwoerister

Here's a proposal for solving these kinds of problems in a more structured way: https://github.com/rust-lang/compiler-team/issues/737

michaelwoerister avatar Apr 04 '24 17:04 michaelwoerister

I'm seeing no objections to cancelling the FCP, so: @rfcbot fcp cancel

I think MCP 737 provides a good, general solution to evolving the mangling scheme without too much breakage.

michaelwoerister avatar Apr 10 '24 12:04 michaelwoerister

@michaelwoerister proposal cancelled.

rfcbot avatar Apr 10 '24 12:04 rfcbot

Thanks everyone for the feedback, I have opened https://github.com/rust-lang/rust/pull/123816 to supercede this.

tgross35 avatar Apr 11 '24 18:04 tgross35