spec
spec copied to clipboard
Status of 0xFF opcode prefix
We're using the prefix 0xFF internally in SpiderMonkey to represent some private opcodes (for the asm.js -> wasm translator that we use for handling asm.js, among other things). I've been looking for spec prose, reference interpreter code, or issues that designate this prefix as reserved for the implementation, but I've found nothing so far. Yet I seem to remember that it is so. Does anyone else have any more specific information about this?
I also remember this to be the case, though I don't think it is documented anywhere except in the meeting notes when it was discussed. Edit: The exact wording on the poll was "Poll: 0xFF should be reserved prefix for when we run out"
@dtig, nice find. That's probably good enough for us for the time being.
Hm, interestingly, those notes also contain a (successful) poll that decided that "0xFF should be reserved prefix for when we run out" (of opcodes). That seems to mean something else than being reserved for implementations, though I don't know what exactly...
@rossberg, i agree, but it looks like we're not in real danger of running out any time soon. Meanwhile, the "unused" prefixes we had been using for some experimental work are in imminent danger of being used by new proposals, so this seemed like a good time to try to get our house in order.
I support reserving 0xFF
for internal use, as long as feasible, and would support spec text to that effect.
I'm still a bit confused about what precisely we want to state and what guarantees that would actually imply. It sounds almost like "0xff is available to implementations until it isn't", which seems a bit like a vacuous statement for a standard?
If the goal is to prevent proposals from clobbering 0xff if they don't have to, perhaps that rather is something to note in, e.g., the process doc or some design doc?
It looks like there is nothing actionable on the spec side for this, closing.
Can we add text to the spec specifically guaranteeing that 0xFF will not be used for future expansion, and will be left for implementations to use?
(see also https://github.com/WebAssembly/design/issues/1141)
"0xFF should be reserved prefix for when we run out" (of opcodes) Then, can we pick a different value for this, and note it somewhere?
This has recently come up for us because we would like to use it as a slow-path escape hatch, but it seems like multiple parties have interesting uses for it.
+1 for an explicit reservation of 0xFF
as illegal. Wizard uses an illegal opcode to implement instrumentation and it would be good to know there will always be a 1-byte illegal opcode.
That is all fine, but my earlier comments still stand:
-
Technically, the opcode currently is illegal. Beyond that, it is not clear how the spec would be capable of saying anything normative about future versions of itself – given that it can always be modified, such a statement would be inherently meaningless. Like other backwards compat constraints, this is not spec-level but meta-level. At most, the spec could have an informal note explaining the intent.
-
The intent seems to be in contradiction with the earlier CG vote digged up by @dtig, at least as I read it. So I believe this would require an overriding vote.
I've added a quick agenda item for the next meeting at https://github.com/WebAssembly/meetings/pull/1087. I'd also like to add some explicit polls there, so bringing this back for discussion. Would it make sense to explicitly reserve an opcode for instrumentation, and keep 0xFF for future use? Or is it reasonable to assume that implementations in the future can change their instrumentation opcodes to a new scheme (whatever we decide makes most sense in the new encoding scheme when we run out of opcode space)?
That is all fine, but my earlier comments still stand:
1. Technically, the opcode currently _is_ illegal. Beyond that, it is not clear how the spec would be capable of saying anything normative about future versions of itself – given that it can always be modified, such a statement would be inherently meaningless. Like other backwards compat constraints, this is not spec-level but meta-level. At most, the spec could have an informal note explaining the intent.
I will not be able to attend the Tuesday meeting, so posting my thoughts ahead of time.
If we wanted 0xFF to be illegal to use indefinitely and not just 'until we run out and need it', we could specify 0xFF as a named invalid
instruction that is in the ast/binary format/text format but validation will always fail on. At that point, I believe we'd be fairly bound by backwards compat constraints to keep it. Having a simple invalid instruction in the text format could possibly be useful for testing too.
But I think it's a valid question for whether we want this 'until we run out and need it'. My current opinion is that devoting a single byte encoding indefinitely for engines to use internally is a reasonable expense, but if other's feel strongly I'm sure engines can always find opcode space for their own uses. call $idx
with $idx
above the maximum allowed functions is an interesting candidate.
If this is a 'until we run out and need it' situation, then I agree with Rossberg that this belongs in the process documentation, with maybe a non-normative note in the spec text.
Actually, specifying an invalid
instruction seems like a great way to implement 'until we run out and need it,' since we are always allowed to make previously-invalid modules validate in the future. (If we weren't allowed to do that, we couldn't change anything at all.)
I won't be able to attend the next meeting either, so just a quick reply.
@eqrion:
If we wanted 0xFF to be illegal to use indefinitely and not just 'until we run out and need it', we could specify 0xFF as a named invalid instruction that is in the ast/binary format/text format but validation will always fail on.
Sorry, I don't think that's observationally different from the status quo, so I don't see what it would buy?
I really do think that this discussion is confusing spec and meta level concerns. The spec does not and can not specify (normatively) its backwards compatibility requirements, since those requirements would have to be "quantified" over all future specs, which it has no jurisdiction over.
If we want the spec to say something, then we could do so as an expression of intent, e.g. in an appendix.
I won't be able to attend the next meeting either, so just a quick reply.
@eqrion:
If we wanted 0xFF to be illegal to use indefinitely and not just 'until we run out and need it', we could specify 0xFF as a named invalid instruction that is in the ast/binary format/text format but validation will always fail on.
Sorry, I don't think that's observationally different from the status quo, so I don't see what it would buy?
By creating an instruction for it, the spec is creating a concept that users could actually write in the text format. You could imagine simple tests written for it that implementations run. Maybe someone would find a use for it somewhere.
Future specs will intend to be backwards compatible with the current spec. If an instruction was added where the entire purpose is to be 'invalid', making that instruction 'valid' would be a breaking change. It'd be a breaking change we very likely could get away with, so that's why I say it's 'fairly' binding.
I really do think that this discussion is confusing spec and meta level concerns. The spec does not and can not specify (normatively) its backwards compatibility requirements, since those requirements would have to be "quantified" over all future specs, which it has no jurisdiction over.
If we want the spec to say something, then we could do so as an expression of intent, e.g. in an appendix.
You're correct that any future spec is theoretically not bound by anything (including backwards compat) in the current spec. I'm just trying to say that there are stronger expressions of intent than a note in the appendix.
If an instruction was added where the entire purpose is to be 'invalid', making that instruction 'valid' would be a breaking change.
I agree that this is true in plain language! Historically, we've expected that our backwards compatibility obligations only extend to valid programs (edit: and also don't necessarily extend to the text format), and that we are free to make any invalid program validate in future, hence @rossberg's discomfort with the term.
I don't think this is a point that's worth wasting much mental energy over, and I agree that explicitly codifying invalid
in the instruction list would be a stronger (and appropriate) expression of intent for future editions of the specification than adding a note in the appendix. It's just worth noting that if we continue to believe in the above definition of "backwards compatibility", then even making the invalid
opcode validate in future (and giving it new semantics) wouldn't be strictly considered a breaking change.
I think the important point is whether or not we think it's a good idea to express (through any mechanism) an intent never to use 0xFF
.
Isn't it such a beautiful contradiction to say that invalid
is valid text instruction that encodes to a binary that must be rejected as invalid? Yes, yes it is :-)
Other than that, I agree that doing so would be a codified exception to our implicit "invalid programs could possibly become valid in the future" rule. However, I think it's also reasonable to assume that there are programs with egregious errors, like unfixable type errors such as misaligned operand stacks, that we don't expect to ever make valid in the future.
To bikeshed, there's also a world in which invalid
doesn't show up in the instruction list of the AST, but has an explicit entry in the text format and binary format instruction lists as being an error (it's helpful that we deliberately don't distinguish or enforce any precedence between decode/parse/validation errors in the spec, so implementations would still be free to report its presence as a "validation" error).
I think the important point is whether or not we think it's a good idea to express (through any mechanism) an intent never to use
0xFF
.
Addressing this core question, I would prefer to stick with the original decision, which was to reserve 0xFF for future use as an alternative escape hatch when we otherwise exhaust the opcode/prefix space. That being said, I don't really understand the use case for permanently reserved (private) opcodes in engines.
I don't really understand the use case for permanently reserved (private) opcodes in engines.
E.g. it is used in Wizard to overwrite instructions which have been instrumented so that the interpreter dispatches to a handler that invokes said instrumentation. The illegal instruction needs to be a single byte in order to properly instrument one byte instructions without overwriting a portion of the next instruction, which, e.g. could be a branch target like loop
, else
, or end
.
Hi there. JSC engineer here. I too am in favor of reserving such an opcode mainly for the VM implemention's use. The important ask here is not that the opcode encoding remains invalid so much as that the opcode encoding remains unique and available in the top level set of 256 opcodes. If the discomfort here is with naming it invalid
, how about naming it trap
instead? The semantic for this trap
opcode will be that it crashes the program. The VM may choose to repurpose this trap
opcode in creative ways (as we do with x64 and ARM breakpoint instructions) but that is besides the point. Is adding a trap
opcode an option?
That being said, I don't really understand the use case for permanently reserved (private) opcodes in engines.
SpiderMonkey internally converts asm.js into wasm bytecode for execution. We (mostly) use the 0xFF prefix as a place to put asm.js specific operations that have no wasm equivalent [1].
We could find other ways to encode this into wasm bytecode if 0xFF was not available, but they're more slightly more complicated.
[1] https://searchfox.org/mozilla-central/rev/6ec440e105c2b75d5cae9d34f957a2f85a106d54/js/src/wasm/WasmConstants.h#902
V8 also internally translates asm.js into Wasm along the same lines as @eqrion described.
@MenloDorian How would trap
differ from unreachable
, which also traps?
@MenloDorian How would
trap
differ fromunreachable
, which also traps?
@titzer Oh, you are right. unreachable
will do the trick for breakpoints or escapes. Is there a reason why Wizard and SpiderMonkey can't use this instead of 0xFF
?
@eqrion:
By creating an instruction for it, the spec is creating a concept that users could actually write in the text format.
They already can do that, in both text and binary. And it will be rejected as a syntax/decode error. With your change it will be rejected as invalid. But users cannot distinguish between decode/parse and validation errors, because it's a single phase in most engines. So this makes no observable difference whatsoever. An engine can even implement both interchangeably!
@conrad-watt:
To bikeshed, there's also a world in which invalid doesn't show up in the instruction list of the AST, but has an explicit entry in the text format and binary format instruction lists as being an error
I don't understand. The way something is spec'ed as a syntax error is by having no grammar rule for it. Which is exactly the status quo.
I don't understand. The way something is spec'ed as a syntax error is by having no grammar rule for it. Which is exactly the status quo.
Ultimately this is just about communicating intent in a committal-looking manner - these entries would be hooks for non-normative notes, as well as nudges that an implementation might want to report a distinguished error if 0xFF
is encountered in instruction position (text or binary).
I agree that it's not strictly necessary. As one datapoint for bikeshedding, it seems like the JVM spec does this.
We would have to invent a whole new way of specifying syntax errors. That's total overkill for something that does not even have any technical effect.
If we agree this is just expressing intent, then why not handle it as such? What the JVM does is nothing more really.
@MenloDorian How would
trap
differ fromunreachable
, which also traps?@titzer Oh, you are right.
unreachable
will do the trick for breakpoints or escapes. Is there a reason why Wizard and SpiderMonkey can't use this instead of0xFF
?
I guess while it's possible to use unreachable
as an escape trigger (for breakpoint / slow path triggers for in-place execution), it's still more efficient to have a dedicated opcode like 0xFF
. Using unreachable
would require extra disambiguation (via some metadata) whether the intent is actually to execute an escape or unreachable
semantics. While the execution speed here may be acceptable (for breakpoint / slow paths), using unreachable
may necessitate the allocation of that metadata for each unreachable
opcode. Though the amount may be negligible (due to the unreachable
being rare, I presume), this is not ideal.
@MenloDorian
Repurposing unreachable
is possible, but pretty ugly, because its meaning would then be ambiguous to the interpreter. Wizard makes a copy of the original bytes when a probe is inserted so that it can remember what the original opcode was, to redispatch on it after running instrumentation. So unreachable
would behave as if it were always instrumented, the runtime would have to have to be prepared to not find instrumentation there, and the redispatch sequence would have to use a special table where unreachable
generates a trap, not the trigger for instrumentation (otherwise infinite loop).
An invalid opcode is really a lot cleaner solution.