gc Remove redundant typeidx on gc instructions

This removes the remaining redundant type indices on GC instructions, following the discussion in WebAssembly/function-references#27 and #238. It makes GC instructions consistent with what we already decided on for other ref instructions, via WebAssembly/reference-types#99 and WebAssembly/function-references#31.

See MVP.md for the relevant changes. Interpreter and test suite are adjusted accordingly.

@jakobkummerow, PTAL.

@titzer, I know that you have argued against this change. However, we already do the same for other instructions, so do you think an interpreter can actually save much complexity if there are some instructions that have annotations while others don't? It seems like the general infrastructure for side tables is needed either way.

According to @askeksa-google's measurements, this change saves 3.5-5% in code size. It should also reduce validation time, since it avoids redundant checks.

Removing these annotations from static code and instead deriving them during validation could benefit interpreter performance as well. In particular, their side table could store more relevant information directly, e.g., field offset + field type, instead of having just an index of a struct type that it would have to look up and inspect each time, making accesses rather expensive.

Sep 08 '21 15:09 rossberg

I wasn't aware that we reached consensus on this issue, so I think this PR is a little premature, though I do appreciate the amount of work done here.

do you think an interpreter can actually save much complexity if there are some instructions that have annotations while others don't? It seems like the general infrastructure for side tables is needed either way.

Currently Wizard only uses side table entries for control flow, and I think it's best to avoid require new kinds of entries if avoidable. Keep in mind that side-table entries are space cost per instruction, as opposed to other choices like module-relative field indices, which would keep space costs per field declaration.

The only other option (I mentioned in #238) is to put field metadata hanging off object headers. In my opinion, looking at the whole system, that's a mixup of static and dynamic information that points to a clear design mistake in the bytecode. I think it makes more sense to keep metadata hanging off the module (instance) or the code (sidetable), whenever possible, and not objects themselves. (Just imagine what happens when an engine completes tier-up from interpreter to all-JIT for a module; at that point it should be able to free the space for interpreter side tables and module metadata. This would be impossible to do in a multi-module setting if metadata hangs off objects, because other modules that have not completed tiering up need to indirect through objects to find field metadata--that's a clear design flaw IMHO).

As for the other instructions like ref.is_null that eschewed static information, the consequence has been that it impacts runtime representations; eventually Wizard just gave up and settled on one representation for all null references. I don't know if V8 has to do something weird because of JavaScript's null; maybe it just chose JS null as Wasm null.

Sep 08 '21 16:09 titzer

Also, just for the record, field access for GC objects in the interpreter is not going to be pretty. Since we want to pack objects and have unboxed representations, and interpreters usually have value tags, there are a number of things that need to be dealt with such as a.) field's actual offset into object, b.) field's size in bytes (typically 4 or 8 for non-packed fields), c.) field type's value tag. There'll be branches. It won't be pretty :-)

So please interpret (no pun intended) my above comment in terms of proper placing of metadata, as it probably makes only 1 or 2 loads' difference for an interpreter, no matter where you put that metadata.

Sep 08 '21 16:09 titzer

@jakobkummerow Hmm, that's an interesting idea to use the value type tags, but after a little bit of thinking, I think it has other downsides. Wizard does use value type tags for value stack slots, but they are broad classes (i32, i64, f32, f64, ref). I don't think it's good to rely on them for anything else than distinguishing value representations for when Wizard's GC scans value stacks, because that creates complexity elsewhere (e.g. a given reference now could have multiple different type tags, depending on where it came from, so comparing tagged values for equality in the engine is more complex). Requiring dynamic type tags to recover lost static information also eliminates an advantage of a conservative GC, if an engine chooses that. (E.g. I am talking to JSC folks about experimenting with Wizard's interpreter design in JSC, which does support conservative stack scanning, and would remove the need for type tags at all).

In short, I'll repeat what I've said before. Removing static type information from bytecodes because it is (or can only be) recovered from validation is a design mistake. We made that design mistake once with select and I think this change would be even worse. It removes many alternatives in runtime system design.

Sep 08 '21 22:09 titzer

Field Accessors solved multiple problems. One of those problems can be addressed by adding more instructions in the future. But another problem cannot be fixed by adding instructions. This change makes type-checking exponential time when combined with support for multiple upper bounds, which is important to support for a number of Post-MVP features. So, this PR should probably include an accompanying change (such as one of the ones described in #238) to preserve the extensibility of the MVP. I can write that up, possibly tomorrow, to give a sense of what that would look like.

Sep 08 '21 22:09 RossTate

@titzer: In V8, our solution is that we've refactored the former ValueType (which was as coarse as you describe) into a bit field: now we have 5 bits for i32|...|ref|rtt (we call it "kind"), 20 bits for the referenced heap type if kind==ref|optref|rtt|rtt_with_depth, and another 6 bits for the depth if kind==rtt_with_depth. Depending on use case, we can check for just the kind (e.g. when finding references to scan with the GC) or the other fields as well (e.g. when we want to know which specific struct type it is). I assume that a similar design would work for an interpreter, but I haven't tried it.

@RossTate: That's all fair; all I'm saying is that landing this PR now-ish doesn't prevent us from adding Field Accessors to the instructions that are losing their static type annotations here, if we decide (next week or next month or whenever) to do so. Regarding exponential-time checking in particular (which of course nobody wants), I'd assume that that, too, can be avoided with new instructions: the old instructions could simply not validate with types that have multiple upper bounds.

Sep 08 '21 23:09 jakobkummerow

I'd assume that that, too, can be avoided with new instructions: the old instructions could simply not validate with types that have multiple upper bounds.

Unfortunately your suggestion is essentially to break subsumption, a property that's important for keeping transformations like function inlining type-preserving. In fact, removing type indices only works because you have subsumption, so your fix might actually make type-checking exponential time itself.

So I don't think it's safe to assume we can solve the problem by adding new instructions. The accompanying change I have in mind is simpler anyways.

Sep 08 '21 23:09 RossTate

So WebAssembly has the subsumption rule/property that implies that if an instruction type-checks with ref A on the (input) stack, then it also type-checks with ref B on the (input) stack whenever B is a subtype of A, i.e. B <: A. Besides formalizing an intuition for what subtyping means, subsumption is also very important for transformability of code (e.g. inlining) and reduction of code size (otherwise you need explicit upcasts all over the place).

But subsumption can be problematic for decidability (and yet, at the same time, help with decidability). The issue here is that if you have a ref X on the stack, where X is some "abstract" type variable, and you have some instruction like struct.get that expects some more "concrete" type, then subsumption requires you to enumerate through all the concrete supertypes of X in order to see if any of them make the instruction type-check. And, since the output type of struct.get depends on the choice of concrete type, you get exponential type-checking if there is no "principal" choice.

The way we can fix this is to encode how iTalX addressed the problem: ensure that either every abstract type variable has a principal concrete supertype or is uninstantiable (i.e. there is no concrete type that satisfies the constraints on the type variable, which implies the current code location is unreachable). How do we do these? Well, if our concrete types (i.e. types providing fields) are nominal types, we can restrict definitions of new nominal types to specify at most one nominal supertype. This makes the nominal subtyping hierarchy into a tree, which ensures a nice property: a (finite) collection of nominal types has a common nominal subtype if and only if one nominal type in the collection is a subtype of all the others.

As a consequence, to type-check struct.get when there's a ref X on the stack, we can look through the concrete upper bounds of X: if one of them is a subtype of the rest, then we use the field type that one specifies; otherwise, X is uninstantiable and we deem the instruction's return type to be unreachable.

Now, it's already the case that we restrict nominal type definitions to declaring at most one supertype. So why does this require a change? Well, the issue is that the above type-checking rule for struct.get is only sound if we have full knowledge of the subtyping relationships between the concrete supertypes of X. But if we can arbitrarily import two separate concrete types A and B, then an X that is a subtype of both A and B might be instantiable even if neither is a subtype of the other according to what the importing module knows of them. So we need to restrict the arbitrariness of imports.

One way you might think to do so is to say that if you import both A and B without any subtyping relation between them, then they must be unrelated in the exporting module. That's along the right line, and it does make type-checking for struct.get sound, but unfortunately it breaks module composition. For example, if you combine one module importing A with another module importing B, the combined module importing both A and B now is restricted in how they could be instantiated even though the original separate modules had no such restriction.

To solve this, we can add a notion of "hierarchies". A module can define/import hierarchies, and every hierarchy provides a root/top nominal type. When you define/import a nominal type, you always specify a nominal supertype, which ensures that all nominal types belong to some hierarchy. If you import two nominal types from the same hierarchy, then if your imports don't explicitly say they're related, it must be the case that they're not related in the exporting module. If you import/parameterize-by an abstract type variable, you specify the hierarchy that it belongs to along with as many upper bounds of abstract/concrete types from that hierarchy as you want (but you don't specify any fields, as all fields are derived from the upper bounds).

This keeps everything efficiently decidable, keeps modules composable, and makes room for some useful Post-MVP extensions (and one thing regarding v-tables that might make sense for the MVP, but I'll leave that for later if it's utility comes up).

With the rationale done, here are the accompanying changes to this document:

eliminate the non-_extending type definitions
add a hierarchy type definition code whose immediates are 1, 0, vec(fieldtype)

If a u32 is used to denote a nominal type and references the hierarchy type definition, it denotes the "top" nominal type of the hierarchy, whose fields are the specified vector.

The reason for the immediates 1 and 0 is that, in the Post-MVP, it will be useful to associate more things with a nominal name (in order to support things like generics). So the 1 indicates that there's one element in the vector, and 0, vec(fieldtype) defines that element. The 0 then is a code indicating that the element specifies a list of fields.

@jakobkummerow Do those accompanying changes seem feasible? (There are also changes to type imports, but my understanding is that type imports is not currently implemented, so I'm not going into those.)

(As a separate note, I would suggesting renaming _extending to _refining, since those codes are letting you refine the existing fields of the supertype, and since there will be uses for just extending a supertype with new fields.)

Sep 09 '21 14:09 RossTate

@RossTate

Do those accompanying changes seem feasible?

No, not at this time. The stated goal (see #234) is to maintain the "structural" type definitions we've had so far, in order to allow comparative evaluations, so we're not going to eliminate them for now.

As for the more distant future, nothing is set in stone. We can add hierarchy whenever we want. (Personally I'd like to see a somewhat more complete description of its semantics, and some discussion of its merits, before starting to implement.)

After reading your post, I also still think that we could (theoretically; not making a statement on likelihood) ship 0xfb03 = struct.get (without type immediate, as specified in this PR) in the MVP, and afterwards (all at once): (1) refine its definition (in a backwards-compatible manner, i.e. preserving existing semantics) to require an element with unique supertype chain on the value stack ("a ref X with concrete type X", as you call it) (2) introduce new types that wouldn't fulfill this property (3) introduce a new 0xfbYZ = struct.get_new_fancy with the ability to deal with these types. Alternatively, these things could happen at some point between now and shipping the MVP; in which case we could re-evaluate whether the old struct.get is still needed then; I'd guess that simple accesses to simple types will likely still be useful for the common (probably?) case of entirely intra-module-contained operations.

Put differently: before this PR here, the type immediate of struct.get is/was 100% redundant information: the value actually found on the value stack had to match it (i.e. its type must be a subtype of the immediate), else validation failure. I don't see how dropping this redundant information paints us into any unrecoverable corner.

Sep 09 '21 17:09 jakobkummerow

Oh, I forgot about keeping the structural stuff around. In that case, the change would be to require that the u32 immediate for _extending type definitions must refer to either hierarchy or other _extending type definitions.

Put differently: before this PR here, the type immediate of struct.get is/was 100% redundant information: the value actually found on the value stack had to match it (i.e. its type must be a subtype of the immediate), else validation failure. I don't see how dropping this redundant information paints us into any unrecoverable corner.

If "match it" meant "be equal to", then it would be redundant. But, you said "subtype of". The effect of that is that the output type of struct.get with an explicit type is determined by that explicit type rather than the type of the value on the stack in potentially a non-deterministic manner (due to subtyping). If you remove that immediate, then the output type of struct.get is determined by the type of the value on the stack. Last year, I presented a proof (slides) that (1) this change to index-only instructions, combined with (2) this core WebAssembly typing rule and (3) multiple arbitrary upper bounds on type variables/imports, makes type-checking NP-Hard (as type-checking solves vertex coloring).

I can't really do anything about the consequence of that theorem. It just exists as a truth we have to deal with. What I can do is prevent one of the three assumptions of that theorem from holding. Field accessors work to prevent (1), while also providing additional functionality at the same time. Hierarchies work to prevent (3), by ensuring that either there's necessarily a principle upper bound (if any) that provides fields or the instruction is provably unreachable. Any solution that does not prevent either (1) or (3) necessarily must prevent (2), i.e. must remove a key typing rule from core WebAssembly.

Sep 10 '21 02:09 RossTate

Do you agree that with the current type system, (3) can't happen?

If so, then what I'm saying is that we can put mitigations in place against (1)+(2)+(3) occurring together if and when we're making (3) a possibility.

Pragmatically: right now, in a single pass over any given function, we can determine in O(1) and unambiguously the type for any value on the stack, so we can also deterministically and efficiently determine the type of any field loaded by a struct.get. For the sake of argument, suppose we ship the current rules in the MVP, and then afterwards want to add the ability for modules to import or define types with multiple upper bounds. Backwards-compatibility means we can't break MVP-conformant modules. So at that time, we can prevent (1) by changing the validation rules of struct.get to include "if the type of the n-th field of the value on the stack cannot be uniquely and deterministically determined in constant time, the instruction fails to validate" (or similar). That would mean that old modules continue to behave as they used to; while newly created modules that want to make use of the new types can be built to use struct.get_new_fancy instead.

Another thing that's becoming clear from these discussions is that struct.get and struct.new_with_rtt fall into slightly different categories. IIUC, @RossTate's concerns only apply to the former; so one way to address them would be to limit the type-annotation-dropping change to just the {struct,array}.new_{,default_}with_rtt instructions, where we currently require exact equality of the type immediate and the RTT's type. (That said, I still insist that that whole discussion doesn't need to block this PR from landing; if needed we can revert half of it later, as opposed to landing only half of it now.)

Sep 10 '21 14:09 jakobkummerow

Another thing that's becoming clear from these discussions is that struct.get and struct.new_with_rtt fall into slightly different categories. IIUC, @RossTate's concerns only apply to the former; so one way to address them would be to limit the type-annotation-dropping change to just the {struct,array}.new_{,default_}with_rtt instructions, where we currently require exact equality of the type immediate and the RTT's type.

Yes, this is fine. Because rtt is invariant, it uniquely determines the type that directs type-checking.

So at that time, we can prevent (1) by changing the validation rules of struct.get to include "if the type of the n-th field of the value on the stack cannot be uniquely and deterministically determined in constant time, the instruction fails to validate" (or similar).

Let's suppose ref A has the property that the type of its n-th field is uniquely/deterministically determined to be tA, and as such struct.get accepts it. Let's suppose an instruction instr preceding struct.get outputs ref X, and ref X is a subtype of ref A but does not have a uniquely determined n-th field. According to this core WebAssembly rule, the instruction sequence instr; struct.get must still type-check with output type tA despite the field not being uniquely determined for the type resulting from instr. This applies to every supertype of ref X that has a uniquely determined n-th field. So if ref X is also a subtype of ref B whose n-th field's type is uniquely tB, then instr; struct.get must also be accepted with output type tB. This makes your restricted type system still NP-Hard.

That's the gravity of subsumptive subtyping, which is the property that core rule captures. C# threw out that rule in order to be efficiently type-checkable. As a consequence, it is quite a pain to safely perform code transformations on C# code. That's not a huge problem for C#, though, because most transformations are actually done at the bytecode level, which uses field accessors to keep type-checking efficiently decidable. (And our typed assembly language for C# used hierarchies to keep type-checking efficiently decidable.) Java had to throw out declarative typing rules altogether, because people were having difficulties porting their code between Java compilers because they were necessarily all incomplete implementations of the type system. For Java 8, they got all the compiler writers together and standardized a type-checking algorithm they would all use, which became part of the spec. In the process, they developed a superalgorithm that was more precise than all the other algorithms. But they found that the improvement actually broke some key codebases because it discovered ambiguities (along the lines of non-uniquely determined field types - though really method types) in them that none of the previous systems had noticed. So they broke backwards compatibility and worked with those teams to change the code to conform to the new standard. None of this affected the Java bytecode though, because the bytecode uses field accessors.

So I understand the appeal of the simplicity of this change, and I recognize that it's unintuitive how problematic the change is permanently, but throughout my decade of consulting with language teams I have seen many times how these seemingly benign little things end up biting you in the ass.

Sep 10 '21 15:09 RossTate

The measurements mentioned by @rossberg which indicated a code size saving of 3.5% uncompressed and 5% compressed just removed the type from struct.get and struct.set. With the type removed from all of the instructions in this PR, I see a saving of 5.8% uncompressed and 6.4% compressed (zopfli).

Sep 10 '21 15:09 askeksa-google

@askeksa-google Do you have numbers if the field index were not type-relative but module relative (i.e. bigger)?

Sep 10 '21 15:09 titzer

That comparison, without the change in this PR, is described here (2% smaller uncompressed, 0.2% bigger zopfli-compressed for the version that avoids the redundancy of superclass fields). With this change on top (for all the other instructions), I see a total saving of 4.2% uncompressed and 1.1% zopfli-compressed. So compared to leaving out the type, module-global field indices cost more than 5% on the compressed size in both cases.

Sep 10 '21 15:09 askeksa-google

Let's suppose ref A has the property that the type of its n-th field is uniquely/deterministically determined to be tA, and as such struct.get accepts it. Let's suppose an instruction instr preceding struct.get outputs ref X, and ref X is a subtype of ref A but does not have a uniquely determined n-th field.

@RossTate, could you provide a (hypothetical) concrete example for this situation? I follow your logic from here, but I'm not sure where this premise would apply. Could we add extra restrictions to disallow X from being a subtype of A for all X and A that would otherwise satisfy this premise?

Sep 10 '21 17:09 tlively

Sure thing, but can you first confirm that you are asking for a Post-MVP application where support for the described situation (i.e. support for type variables/imports with multiple upper bounds) would be helpful? I want to make sure I answer the intended question 😃

Sep 10 '21 17:09 RossTate

Yes, I am asking for a hypothetical Post-MVP extension here :) I don't think we should spend a lot of time trying to predict whether we will or will not want to add any particular post-MVP feature, but I would appreciate a taste of how these problems might come up.

I was also chatting with @jakobkummerow about this offline, and he pointed out that in the worst case we could completely disallow struct.get in modules that use some fancy new kind of type and mandate that struct.get_new_fancy be used instead. That would be stronger than just disallowing struct.get on the new kinds of types and should work around any typing issues AFAICT.

Sep 10 '21 18:09 tlively

I would appreciate a taste of how these problems might come up.

Happy to oblige. There are two use cases that I know of. The first is multiple upper bounds on type imports that is useful for things like Jawa. Now, bounds on type imports are static, so we have more control here and might be able to come up with a means to restrict the collection of upper bounds in order to ensure a principle "concrete" upper bound, though it might not be possible to do so in a way that supports useful module operations like partial instantiation.

Regardless, the second use case is more difficult. This use case is user-defined casting. Right now we're building in a specific casting model, which is essentially the JVMs model for classes. That's perfectly good for our pressing use cases, but it's not great for a number of languages. And, if we ever get to generics, I think it's unlikely that any baked-in casting model will work well. For example, I know of 4 casting models that are used in practice for Pair<A,B>/A * B, each of which has very different memory-efficiency/time-efficiency/expressiveness characteristics that make them each better suited to different languages.

Type-checking user-defined casting is all about reasoning about type variables and bounds. As a first step, rather than using ref Foo to denote the type of an instance of Foo, you use exactref X where X is a type variable representing the (unknown) exact type of the instance and has Foo as a (concrete) upper bound. If the user wants to implement something like rtt-style casts, then the reference's v-table/object-descriptor has a table of run-time identifiers of supertypes of X, i.e. an array (exists y :> X. Identifier(y)). If the user wants to cast the instance to, say, Bar, they load an element of this array. This element has an existential type, so the type-checker (or a pseudo-instruction) "opens/unpacks" the existential by introducing a fresh type variable with the respective constraint. This results in, say, a type variable Z, which is now a new upper bound on X. This means that we're "dynamically" introducing upper bounds on type variables, meaning we can't rely on solutions that constrain the collection of upper bounds. Next, we check that the resulting unknown identifier, whose type is Identifier(Z), is equal to some run-time identifier for Boo, whose type is Identifier(Bar). If so, on the "equal" branch we deduce that Z = Boo, and consequently that X <: Boo. This means we know that the reference has all the fields provided by Boo and can access them in this branch (yay!). It also means that X now has two upper bounds: Foo and Bar. Plus, you can show that subsumption implies that this needs to work regardless of what Foo is, which means we really need a way to type-check struct.get/set on a ref X even for arbitrary combinations of concrete upper bounds on X.

Phew, sorry for the big dense paragraph. Note that in the above I used existentials because I know you, specifically, are familiar with them. I've been working on a more engine-friendly formulation of the ideas based on feedback from other discussions. But regardless of the specifics, type variables or abstract types and multiple concrete upper bounds seem to be fundamental to the space, which is why I'm pressing to maintain forwards compatibility with them.

we could completely disallow struct.get in modules that use some fancy new kind of type and mandate that struct.get_new_fancy be used instead

That seems heavy handed, especially compared to declaring types slightly differently. But if what's prompting that (correct) suggestion is a desire to pull this in now, since it now seems like there's an informed understanding that doing so comes with a cost of a further change (ideally to the MVP so that the change can be to just type definitions rather than instructions), I'm fine with pulling this now and collaborating on how best to make that change in a separate thread. (Though first we should also conclude discussion of the other issue, regarding interpreters, if this one is resolved.)

Sep 13 '21 15:09 RossTate

Thanks for the examples, @RossTate! My curiosity is satisfied.

It sounds good to me to pull this in once(/if) we resolve other discussions about it with the knowledge that it is a potential future compatibility hazard and with willingness to roll it back if beneficial in the future.

Sep 13 '21 16:09 tlively

I suppose it boils down to how likely we think it is that we'll need a variant of struct.get with static type annotation in the future. Apparently @RossTate thinks that this is very likely :-)

As an incremental way to make progress, @rossberg, how do you feel about limiting this change to the new family of instructions for now? That part seems to be uncontroversial (and will at least save some bytes, though likely far fewer than the whole thing), and that way we can allow ourselves some more time to come to a conclusion on the get/set instructions. Going back and forth does have a cost, so while it's certainly possible to land everything now and revert partially later if needed, we can also optimize for reducing churn if we think that that partial revert is going to happen more likely than not.

Sep 13 '21 17:09 jakobkummerow

I'm confused. I said it's fine to not ever have any annotation on struct.get so long as we slightly change how types are defined. That is, the (future) accompanying change (for the MVP) would be a change to the Types section and nowhere else.

Sep 13 '21 17:09 RossTate

@RossTate Oh, sorry, then I misunderstood "we really need a way to type-check struct.get/set on a ref X even for arbitrary combinations of concrete upper bounds on X". If you're happy with this change after all then I guess we can land it as it is.

Sep 13 '21 17:09 jakobkummerow

Ah. That way could either be have an annotation (to prevent ambiguities when there would otherwise be multiple incomparable applicable concrete types) or impose some more structure (such as hierarchies) in the MVP on how concrete types are defined/imported (so that one of the applicable concrete types is always guaranteed to subsume the rest).

I'm happy with either way, and at present it seems the latter has significant impact on reducing compressed binary size.

Sep 13 '21 17:09 RossTate

I think we should not optimize for binary size until we are almost finished with the design, and then we should do so carefully and in a coordinated fashion. There are too many moving parts right now.

Sep 14 '21 03:09 titzer

We can productively discuss how to maintain principal types in the presence of multiple bounds when we add the latter. There is more than one way that can be solved, regardless of this PR.

#160 has been open for nearly a year, with no particularly concrete suggestions on how to address the issue, let alone in a way that is expressive enough to address the use case I outlined above (based on how the only type assembly languages that have scaled to major languages support casts and generics). At present the only solution I know of requires changing the (type section of the) MVP (sometime after this PR), and the reasoning above strongly suggests that the MVP would necessarily not be suitably extensible without some such change. I would be happy to learn of alternatives in #160, if you would like to offer them, but in the meanwhile I think we should keep the door open for at least one solution that is known to work in this space.

As I already stated, I am fine with this PR being pulled now, given that others have expressed enough understanding of the issue.

Sep 14 '21 12:09 RossTate

@RossTate:

#160 has been open for nearly a year, with no particularly concrete suggestions on how to address the issue

That's not unexpected, because it's a Post-MVP question, and we have more immediate problems to work on.

Sep 14 '21 14:09 rossberg

That's fine. I'm just pointing out that, as a consequence of not working on that issue, there is currently only one known annotation-free solution to it, and so we should keep the door open for that solution. It's risk mitigation.

Sep 14 '21 14:09 RossTate

Is it a given that a side table requires space proportional to the number of instructions? Can't it be represented more compactly?

Just to be clear, it's space proportional to the instructions where annotations would now be missing (not unrelated, existing instructions). So it basically undoes any over-the-wire savings gained here. And then some, as the side-table entries are bigger. We could drill into the details and options, but just to summarize: I think the most space-efficient solution would be to have a side-table entry that is just an index into a module-local table that contains the field offset, field size (or maybe mask), and value tag. So that'd basically end up making the LEB struct-relative field index immediate redundant, which makes it fairly obvious to me that that's the wrong choice. But we could just as well put in the module-relative field index as the LEB immediate.

Furthermore, my understanding was that you need a side table anyway. So haven't you already sunk that cost, and would just be filling it with additional entries?

Again, it adds new types of entries and they're encoded differently. It just makes things more complicated.

Assuming this can be represented compactly in an interpreter, it seems like a clear win not to require this information to be present in all code, and only have it take up space in engines that require it.

I don't agree that it is a clear win, even for space. It's only a win for binary size, and that ignores memory footprint. A major selling point of an interpreter is the memory savings, and this works against that. It just inflates the additional metadata you need besides the bytecode.

Besides, I think removing "redundant" type annotations is fraught with problems, as we already saw with select and others have pointed out in this thread.

(I'd assume it's also a clear win in terms of performance, because there is less to check. That's even for an interpreter that needs it, because it still costs less to store the implicit information from validation than to validate explicit information.)

I don't think we need to optimize for interpreter performance, as memory is way more important--other than really poor design choices. There's going to be some dozen instructions for field access no matter what. But given that, there's at least one LEB immediate to decode (or skip); adding additional side table operations is just additional work. From an interpreter perspective, the fewer things to decode is better, so the LEB in the immediate might as well be worth something.

The main motivation for making this PR is to make the proposal consistent, with itself and preceding proposals. Right now, there are instructions that still have these annotations, and others that don't. Clearly, that doesn't make sense. So something needs changing either way.

Yes, we should be consistent. I think annotations are better future-proofing against unforeseen disadvantages (i.e. select) and given that, with care we can easily make annotations useful for all tiers.

I am also coming around to the idea of importing fields across modules, and using a module-relative field index for field access instructions is a path to that.

Sep 16 '21 00:09 titzer

Just to be clear, it's space proportional to the instructions where annotations would now be missing (not unrelated, existing instructions).

Ah, okay, that's what I would have hoped. Thanks for the clarification.

Assuming this can be represented compactly in an interpreter, it seems like a clear win not to require this information to be present in all code, and only have it take up space in engines that require it.

I don't agree that it is a clear win, even for space. It's only a win for binary size, and that ignores memory footprint. A major selling point of an interpreter is the memory savings, and this works against that. It just inflates the additional metadata you need besides the bytecode.

Hm, frankly, this sounds as if the reasoning is: the extra price for this space overhead should be payed everywhere, rather than just in interpreters that need it, so that these interpreters don't lose in comparison. I'm not sure if that is the best global criterion for optimising the design. :)

Besides, I think removing "redundant" type annotations is fraught with problems, as we already saw with select and others have pointed out in this thread.

True, but we addressed it without mandatory annotations in the common case by only requiring them in relevant cases. The same strategy would work if a comparable problem should ever come up with the instructions in question here (essentially what @jakobkummerow's called struct.get_new_fancy above).

That said, I'm not too concerned that this situation would arise for the instructions in question (select is rather different as a generic data flow join), unless we screw up badly elsewhere.

(I'd assume it's also a clear win in terms of performance, because there is less to check. That's even for an interpreter that needs it, because it still costs less to store the implicit information from validation than to validate explicit information.)

I don't think we need to optimize for interpreter performance

Agreed, my point above was that every implementation will have a performance benefit, even interpreters.

I am also coming around to the idea of importing fields across modules, and using a module-relative field index for field access instructions is a path to that.

Yeah, I fail to see the benefit so far, other than obfuscating operational aspects that should naturally be explicit in an assembly-level language. But that's for another discussion thread.

Sep 16 '21 06:09 rossberg

I don't agree that it is a clear win, even for space. It's only a win for binary size, and that ignores memory footprint. A major selling point of an interpreter is the memory savings, and this works against that. It just inflates the additional metadata you need besides the bytecode.

Hm, frankly, this sounds as if the reasoning is: the extra price for this space overhead should be payed everywhere, rather than just in interpreters that need it, so that these interpreters don't lose in comparison. I'm not sure if that is the best global criterion for optimising the design. :)

No, an interpreter will easily win over any JIT tier, which inflate code anywhere from 2x to 5x. This isn't an attempt to "make interpreters look better" by making the other tiers worse. In fact, I never think of tiers being in competition. They are complementary. JITs can ditch the bytecode altogether, so it's actually no space overhead for them, whichever choice we make here. Static compilers definitely can and will dump the bytecode, so it makes no difference. Rather, my point is that one of the major advantages of interpreters is space, and choices here just reduce that advantage, reducing their complementary advantage in a multi-tier system.

I foresee that many systems will likely employ 3 tiers; in fact, JSC already does. I think Wizard will end up with 3 tiers in the end, too. The performance results from its fast interpreter are compelling enough that I think engines will consider adopting its in-place design.

In general, I think we should reason about whole systems and not just one aspect like binary size.

I don't think we need to optimize for interpreter performance Agreed, my point above was that every implementation will have a performance benefit, even interpreters.

I think we both agree that it makes no difference for the code generated, but codegen speed for baseline JITs is also affected by how much metadata they must reconstruct as well. In particular, a consquence of this change would be that baseline JIT like Liftoff would also have to model types to determine fields. Liftoff has already been tuned to turn off several validation checks when re-walking bytecode (using advanced templating tricks in the common function body decoder) and that does make a measurable codegen speed difference.

This is what I mean by taking a wider view of systems. "Redundant" annotations are only redundant w.r.t. to the specific validation algorithm that reconstructs types in its abstract interpretation. There are a lot of other things that process bytecode, and changes like this force them to replicate parts of the validation algorithm just to recover information that should be present in bytecode but is not encoded properly, e.g. in immediates. E.g. just basic questions like "where is this field used in the code" could not be answered by simply iterating bytecode-to-bytecode. This information is present in Java class files, DEX files, and C# assemblies, and every bytecode format I can think of offhand. So this decision would be a deviation. That feels like a mistake. Binary size does not seem worth it.

Sep 16 '21 16:09 titzer

gc gc copied to clipboard

Remove redundant typeidx on gc instructions

gc
gc copied to clipboard