gc
gc copied to clipboard
Extending the Name section for struct fields
For an improved debugging experience, it would be desirable to have debuggers show names for struct fields, so that developers can map objects fields they see in a Wasm debugger back to the class/etc member fields they defined in their source language. Extending the custom name section for this purpose seems straightforward: based on the extended-name-section proposal, I assume we will add:
id = 10, fieldnamesubsec ::= namesubsection_10 (indirectnamemap)
i.e. same structure as the "local names" subsection, just with different ID and meaning. Two-level index: type_index + field_index.
I understand the motivation, but how are you envisioning this working? Remember that this MVP is specifically designed so that types (and their fields) are identified with their structures. This means that Integer
and all enum
references in Java have the same compile-time and run-time type in the MVP, and all Java reference arrays have the same compile-time and run-time types. So you could easily have the same field of the same type being given multiple names within the same module, let alone across modules.
It's up to toolchains to decide what to put in there. An approximate name is better than no name at all (or an auto-generated name like "field0"
).
Given our current prototype implementation (where automaton-minimization-and-hashing isn't implemented yet), a toolchain can simply emit distinct static types for distinct source types. Time will tell whether this will change... but even if it does: an approximate name, or partial naming information, is better than no names at all.
Also, it's an optional section, and all entries in it are optional too. If a toolchain doesn't want to bother with it (in general, or for a given struct/array type, or a given field in that type), it can just skip it. Code will still work, only the debugging experience will be degraded.
Also, the extended-name-section proposal already includes names for types (as a whole), and if we have that, it seems obvious that we also want names for the internal elements of these types.
This is 100% pragmatism. I'm working on allowing people (=toolchain authors, and toolchain-using application authors) to actually try this WasmGC thing that we're trying to build here. For that, having debugging information is immensely helpful. And before privately implementing some made-up behavior, I wanted to publicly post about it so the idea is documented.
Given our current prototype implementation (where automaton-minimization-and-hashing isn't implemented yet), a toolchain can simply emit distinct static types for distinct source types.
This was one of the suggestions I had in mind, but I wasn't sure if it would meet the pragmatic needs y'all had in hand as it would be much worse than even basic debuggers. Nonetheless, it might be the best option available. It's also worth noting that this relies on explicitly violating the expectation that generators will minimize the types used in modules.
In my Java compiler in the debug mode (which write the name section) every Java type has one related wasm type. There is a 1:1 mapping.
With the productive mode the types will be reduced to unique wasm types and no name section will be write. That I see no problem with writing field information.
@rossberg, this is where the struct field naming used in Binaryen was proposed. Can you elaborate on the issues you see with it?
@tlively, no, the name section is unrelated. What I was referring to was uses of duplicate symbolic field names I encountered in the J2CL example's .wat file, e.g., like this:
(type $s1 (struct (field $foo i32) (field $bar i32)))
(type $s2 (struct (field $bar i32) (field $baz i32)))
Here, $bar
is bound in two different structs, with different values. The reference interpreter rejects duplicate bindings for the same symbolic field name.
Resolving such ambiguities can currently be made to work because field-related instructions annotate the struct type as well:
struct.get $s1 $bar
So by doing dependent lookup this becomes possible.
However, the plan of record is to get rid of the annotation $s1
here, in which case the $bar
becomes syntactically ambiguous. It could not be resolved without type checking. So having dependent field name lookup is not gonna work moving forward.
To process the text files using dependent field names, I hacked the interpreter to rewrite field names $x
used in the context of a type name $t
to $t::$x
. But that doesn't always work (e.g., when the type is unnamed).
Got it, I had forgotten or not known that the plan of record was to remove those type annotations. Assuming we don't want to have to track types while parsing the text format, I see that duplicate field names would be a problem. I'll work on getting this fixed in Binaryen.