threads icon indicating copy to clipboard operation
threads copied to clipboard

Make shared tables a validation failure

Open RReverser opened this issue 5 years ago • 15 comments

Current Spec changes in Overview list this change to the limits type, which is shared between memories and tables:

The limits type now has an additional field specifying whether the linear memory or table is shared:

limits ::= {min u32, max u32?, share}
share  ::= unshared | shared

However, all operators, JavaScript API changes and spec tests describe only what happens with the shared WebAssembly memory, and I can't find any mentions of shared tables behaviour.

I'd like to understand what are the expectations for the implementer here - is this just a spec bug and these limits types are supposed to be split with only memlimits having a shared variant, or is it something we plan to support in the future on tables too, but is expected to throw a validation error meanwhile.

RReverser avatar Jun 05 '20 01:06 RReverser

Actually, I see https://webassembly.github.io/threads/binary/types.html#binary-limits actually has slightly different types/notations here. I guess Overview is simply out of date?

RReverser avatar Jun 05 '20 01:06 RReverser

This can be factored in a number of (observably equivalent) ways, but what's in the spec draft currently assumes that the fusion of limits and shared flag is merely a detail of the binary format. Conceptually, they are rather separate things, which is what's reflected in the AST. Note that the binary grammar for table types enforces that the shared flag is 0.

rossberg avatar Jun 05 '20 07:06 rossberg

@rossberg Thanks. That notation (lim,0:limits, lim,1:limits) was also a bit confusing - is it documented somewhere? I don't think I've seen this one in particular before, usually it's either literals or identifiers.

Another question: I've noticed that

Note that the binary grammar for table types enforces that the shared flag is 0.

is covered by assert_invalid tests. Shouldn't this be better be declared as a malformed module, so that it could be checked earlier at the parsing level?

RReverser avatar Jun 05 '20 15:06 RReverser

It's a kind of pattern matching. IIRC I used it in a couple of other places in the main spec, but it may indeed not be properly introduced. It may be a bit too smart in this case.

Shouldn't this be better be declared as a malformed module

Yeah, either is a fine choice, but spec and tests should match up. The spec for this proposal still needs a lot of work.

rossberg avatar Jun 05 '20 16:06 rossberg

It may be a bit too smart in this case.

Yeah I have a feeling that it would be easier to grasp if it was written down as two separate types, one generic and one memory-specific.

Yeah, either is a fine choice, but spec and tests should match up.

To clarify - assert_invalid should match only things disallowed by validation phase, not the syntax, right? (I'm still unsure about the boundary between two)

If so, I think this restriction should result in malformed:

image

RReverser avatar Jun 05 '20 16:06 RReverser

I tend to adjusting the spec and make this a validation-time check. Seems more natural in this case.

rossberg avatar Jun 08 '20 06:06 rossberg

Seems more natural in this case.

Why? Or, more generally, are the rules on what should be checked at parsing time vs validation time codified (written down) anywhere?

To me, it seems that validation should be reserved for rules that can't be checked as part of the lexing - that is cross-section checks like "data count and number of data entries should match", "indices should be valid and pointing to real items", "global.get in constant expressions should be pointing to constant globals" etc.

Anything else can be encoded as part of the grammar and checked earlier.

Most of the errors already fit this split quite well and the only two counter-examples that I'm aware of so far that don't fit such separation so far seem to be this one (shared flag shouldn't be allowed on tables + shared flag must be combined with maximum limit) and another one in SIMD for lane indices (specific range of allowed values for given instruction vs arbitrary byte).

It seems worthwhile to adjust these examples rather than moving more rules to the validation stage.

RReverser avatar Jun 08 '20 15:06 RReverser

The question is whether something is part of the abstract syntax or not. In general, the abstract syntax tries to be faithful to what's expressible in the binary syntax, except where something can be treated as just a shorthand, or something is regarded an "encoding detail" of the binary format (like the DataCount section in the bulk instructions proposal).

But to some extend it's a choice. For example, we always had function types with multiple results in the abstract syntax, long before we actually allowed multi-values -- it was just invalid to have more than one. That seemed natural because the generalisation was on the trajectory.

In this case, it feels similarly natural because in the future we likely want to extend tables with sharability annotations as well (see our OOPSLA '19 paper).

rossberg avatar Jun 08 '20 15:06 rossberg

But to some extend it's a choice.

Right, I guess that's the blurry line that sometimes makes me uneasy. It makes sense for what you said about shared tables, and it could even be extended to potentially allowing shared memories without maximum limit in the future.

But then a counter-example to such choice is SIMD lane indices, which I mentioned above, which could be said to be expressible in binary format if they're defined as bytes, or they could be said to be not expressible if we define them as u1, u2, u3 and u4 for ImmLaneIdx{2, 4, 8, 16} correspondingly.

What defines which choice is correct? Arguably they won't (can't) even be extended in the future, so indices outside the range should be malformed, yet the tests currently check that they're only invalid...

RReverser avatar Jun 08 '20 18:06 RReverser

I haven't followed the SIMD proposal closely, but using uN might indeed make sense there. But it depends on whether there always is enough context in the binary grammar to trivially know which type is expected. If you'd need a union of u1..u4 somewhere then not distinguishing different lane types seems preferable.

rossberg avatar Jun 09 '20 06:06 rossberg

I think you could handle it in the binary grammar, since you always know the lane type from the instruction opcode. But if we wanted to handle the syntax as we do with other instructions (grouping them by the kind of instruction they are, e.g. extract_lane, replace_lane) then we'd need to have a union at that point.

binji avatar Jun 09 '20 20:06 binji

@rossberg To get back to the original issue - the resolution here is that someone will update the spec as per

I tend to adjusting the spec and make this a validation-time check. Seems more natural in this case.

correct?

RReverser avatar Jun 15 '20 20:06 RReverser

Yes, I think that's the right change to make here.

binji avatar Jun 15 '20 20:06 binji

Yep.

rossberg avatar Jun 16 '20 06:06 rossberg

@binji I see you renamed the issue to "Make shared tables a validation failure", but I think the resolution was to also have the maximum on memory limits be a validation failure rather than malformed one.

RReverser avatar Oct 19 '20 17:10 RReverser