gc icon indicating copy to clipboard operation
gc copied to clipboard

Universal pointer alignment and variable-length tagging schemes (Post-MVP)

Open ouillie opened this issue 3 years ago • 5 comments

This would be a post-MVP development, so my apologies if this is the wrong place to discuss it, but I think it's interesting to think about while designing the MVP for posterity.

The proposal specifically talks about wanting to support uniform representation with i31ref. Well, uniform representation is conducive to uniform pointer alignment; if I'm representing everything as a 64-bit number (possibly a pointer in a 64-bit address space), then aligning all allocations to 8-byte boundaries is an acceptable way to keep things simple (especially passing pointers to modules that expect C-style alignment rules for 64-bit values).

Since WebAssembly assumes addressability at byte-granularity, a compiler that enforces uniform pointer alignment would essentially have 2 "unused" (i.e. taggable) pointer bits on wasm32, and 3 such bits on wasm64. The proposed i31ref only makes use of 1 of those bits on wasm32.

I feel that, if one of the main motivations for i31ref is uniform representation, then good support for uniform alignment should also be a goal. The paragraph on Variants in the Post-MVP section describes a way to achieve decent support using a high-level abstraction, however I believe more control over the tagging scheme would be particularly useful for wasm64, which could enable beneficial use of variable-length tagging schemes.

To illustrate the possibilities in general terms, consider a fixed-length encoding of N bits. I'll use N=8 as an example. This can encode 2^N=256 different values.

Now, to make it a variable-length encoding, we can split the bits along an axis. For example, we could pick the axis X=3 — the boundary between the 3rd and 4th bits — and visualize it like UUUUULLL where U is the "upper" portion and L is the "lower portion". The lower portion can make use of all but one of it's potential values (in this case, 2^3-1=7) to encode "short" values, using the remaining lower portion value to indicate that the upper portion is also significant, adding an additional 2^5=32 "big" values.

In general, where a fixed-length encoding of N bits has 2^N possible values, a variable-length encoding of up to N bits along an axis X has 2^(X)-1 + 2^(N-X) possible values, but 2^(X)-1 of those values are "short" and require less space to encode.

We can also use multiple axes. For instance, with two axes X1=2 and X2=3, we would split up our 8-bit tag like UUUUUMLL with an additional "middle" portion of X2-X1=1 bit(s). Now, we can encode up to 2^X1-1=3 "short" values, up to 2^(X2-X1)-1=1 "medium" values, and up to 2^(N-X2)=32 "big" values.

In general, a variable-length encoding of N bits along multiple axes X1, X2, ... XM where axes are strictly increasing positive values less than N would have up to 2^(X1)-1 + 2^(X2-X1)-1 + 2^(X3-X2)-1 + ... + 2^(N-XM) possible values, with M+1 different possible encoding lengths.

We can apply this general approach to variable-length encodings to tagging schemes with universal representation using C-bit words. For each possible tag, the variable length of the encoded tag is subtracted from the constant length C of the word to get the effective bit length for each possible unboxed type (tag). For instance, a 2-bit tag leaves C-2 bits for values of the corresponding unboxed type.

With universal representation (C-bit numbers) and universal pointer alignment, a known number of bits Y (2 for 4-byte-aligned, 3 for 8-byte-aligned, etc.) is usable for tagging without practical loss of address space. This effectively enforces an axis at Y, since we need fully C-Y bits to represent all possible aligned pointers. In theory, additional axes could also be used on either side of the Y axis.

As a concrete example, picking C=64, Y=3, and N=8 as the maximum tag size, we can add an additional axis X1=2 on the lower side of Y. This gives us:

  • 2^(X1)-1=3 short tags of 2 bits each, letting all three types represented by those tags use the remaining C-X1=62 bits for the unboxed value.
  • A single (2^(Y-X1)-1=1) medium tag of 3 bits, used for 8-byte-aligned pointers (61 bits each).
  • 2^(N-Y)=32 long tags of 8 bits each, letting all thirty-two types represented by those tags use the remaining C-N=56 bits for the unboxed value.

This concrete example would be desirable if a language wants 62 bits to represent a small number of primitive types, but has a larger number of primitive types for which 56 bits is enough (e.g. UTF-32 characters, file descriptors, timestamps etc.). I believe this is desirable enough in general to warrant the ability to specify it as such.

To implement this via a high-level abstraction, a module could define the variant type for tagged, aligned pointers in terms of alignment (Y), as well as any desired "lower" axes (X1, ..., XM where each axis is less than Y), "upper" axes (Z1, ..., ZL where each is greater than Y), and an ordered list of K unboxed variant types. The word size C is determined by the pointer size (e.g. C=32 for wasm32). The ordered list of K unboxed variant types is mapped to variable-length tag values with the first 2^(X1)-1 variants mapping to short tags of X1 bits, the following 2^(X2-X1)-1 variants mapping to less-short tags of X2 bits, and so on until all the variants are mapped to tags, with the engine reserving one of the Y-bit tags (Y being the only required axis) for pointers (boxed types).

Consider these edge cases:

  • C=64 Y=3 K=1: 64-bit address space with 8-byte-aligned pointers and only one type of unboxed primitive indicated via tagging. This would not make effective use of variable-length tags since no additional axes are specified. In theory, the single primitive type could use up to 63 bits, but since the module did not specify any desired lower axes, the engine guarantees only up to C-Y=61 bits for all values.
  • C=64 X1=1 Y=3 K=1: Similar to above, but with a lower axis at 1. This would guarantee up to 63 bits for the single primitive type.
  • C=64 X1=1 X2=2 Y=3 K=2: Two unboxed types and two lower axes, guaranteeing 63 bits for the first unboxed type and 62 bits for the second.
  • C=64 X1=2 Y=3 K=3: Three unboxed types and one lower axis, guaranteeing 62 bits for all three unboxed types.
  • C=64 X1=2 Y=3 K=20: Twenty unboxed types and one lower axis, guaranteeing 62 bits for the first three unboxed types. That leaves us with 17 unboxed types that must be encoded above the Y axis, requiring up to length-8 tags (N=Y+CEILING(LOG(17))=8 where LOG is base-2) and guaranteeing up to 56 bits for the remaining types' values.
  • C=64 X1=1 Y=3 K=20: Twenty unboxed types and one lower axis, guaranteeing 63 bits for the first unboxed type, up to 61 bits for the next two unboxed types (and also 61 bits for pointers), and — as above — up to 56 bits for the remaining types' values.
  • C=64 X1=1 Y=3 Z1=5 K=20: Twenty unboxed types, one lower axis, and one upper axis. This guarantees 63 bits for the first unboxed type, 61 bits for the next two unboxed types (and also 61 bits for pointers), 59 bits for the next 2^(5-3)-1=3 unboxed types, and up to 55 bits (N=Z1+CEILING(LOG(11))=9) for the remaining 11 unboxed types.

I'm not proposing a specific syntax extension for either the text or binary formats, just pointing out that a general way of defining efficient, variable-length tagging schemes for aligned pointers via abstract variant types is possible using the parameters outlined above. Word size (determined by the architecture) and variant cardinality (determined by the variant type declaration) are the only required parameters, leaving all axis parameters optional. By default, 2-byte-alignment would be used, since that's the minimum alignment that makes tagging at all possible, but higher alignments, as well as additional lower and upper axes, can also be used if desired. Validation would be relatively straightforward to guarantee correct semantics, or throw an instantiation-time error (for instance, if the number of desired tags is greater than the number of possible word values).

If this sounds crazy, then I'd be curious to understand a specific scenario where it would not work out.

ouillie avatar Apr 26 '22 03:04 ouillie

Note that wasm32 and wasm64 refer to the pointer size of a Wasm linear memory, but this GC proposal does not depend on a linear memory at all. The structs and arrays (and i31s) in this proposal cannot be stored in linear memory and do not have observable pointers and therefore there is no way for a Wasm module to observe or depend on their alignment, nor is there any way for the specification to mandate anything about their alignment. Any pointer tagging schemes relevant to this proposal are purely engine implementation details and engines are already free to use any tagging scheme they want.

It's possible to compile a garbage collected language to use linear memory (either wasm32 or wasm64) by compiling in its own GC, and in that case the compiler is already free to use any tagging scheme it wants and does not depend on this GC proposal.

In a post-MVP proposal adding variant types as sketched in the post-MVP doc, it may be useful to be able to provide a hint about what tagging scheme the engine should prefer, but ultimately the engine would be free to use any tagging scheme it wants since the difference would not be observable.

tlively avatar Apr 26 '22 23:04 tlively

I'd push back on "the difference would not be observable". i31ref prescribes 31 bits for unboxed scalars. Knowing exactly how many bits you can get out of them is vitally important for many non-integer types. "It is left to the engine to pick an efficient representation for the required tags" opens the door to non-portable code. I think there should at least be guidance about how many bits can be expected for a given variant in the spec.

ouillie avatar Apr 27 '22 06:04 ouillie

@wm-noble, it's guaranteed that i31ref provides 31 bits of significant payload on all hardware. Still, the engine can pick whatever tagging scheme it seems fit on a given hardware to achieve that. That's not observable and thereby does not affect portability. It merely constrains the engine to use few enough bits for tagging of integers, e.g., the bare minimum of 1 bit on 32 bit hardware – how that bit is interpreted, where it is located in a machine word, how superfluous bits are handled (on 64 bit hardware), and how many additional tag bits and alignment are used for pointers, or even whether or not pointers are compressed to 32 bit on 64 bit hardware, is all up to the engine and unobservable.

There is no portable way to expose additional tag bits directly, since Wasm has to run portably on 32 bit hardware or 64 bit hardware with pointer compression. We have discussed the possibility of introducing variant types that would abstract away additional tagging and allow the most efficient implementation (1st-level vs 2nd-level tagging) for each engine. We have also discussed attaching static data to RTTs, as a means for storing additional fields like secondary custom tags off-object. But as you say, all that's post-MVP.

rossberg avatar Apr 27 '22 07:04 rossberg

If 31 bits is the best that can be done I think many languages wanting pointer tagging would end up having to implement their own gc. Is it better to put that burden on language implementors as opposed to making engine implementations on 32-bit platforms provide more bits at the cost of silent allocations? Seems like, as proposed, 32 bit platforms would have to do a lot of silent allocations either way to support variants.

Just my 2 cents.

ouillie avatar Apr 29 '22 00:04 ouillie

I think we generally want to avoid silent allocations in Wasm because they introduce performance nondeterminism. Predictable performance is a key advantage of Wasm, and we have so far achieved that by only offering the lowest common denominator that can be supported on broadly available hardware. If it becomes a critical need to pack (source-level) variants tighter, I'd rather us consider a more general union mechanism that doesn't implicitly constrain the implementation of ref. E.g. the union of a ref and a, say, 53 (or 63?) bit integer would be a new type that wouldn't be assignable to anyref. On 32-bit platforms then, it might become multiple words.

titzer avatar Apr 29 '22 16:04 titzer

Closing this because we won't be adding any control over tagging schemes to the MVP, but PRs adding ideas for optimization hints or other features to the post-MVP doc are very welcome.

tlively avatar Nov 01 '22 21:11 tlively