parity-wasm
parity-wasm copied to clipboard
Changing widths of integers across round-trips breaks relocations
I pointed out in https://github.com/paritytech/parity-wasm/pull/196#issuecomment-371278025 that LLVM/lld sometimes compiles integers to larger representations than they need. That particular case was an i32.const 0
compiled as 0x41 0x80 0x80 0x80 0x80 0x00
instead of just 0x41 0x00
. parity-wasm can read it just fine, but it serializes it back out as 0x41 0x00
instead of the original representation.
This is a problem for relocations since relocations are measured in byte offsets from the start of the corresponding section. So a round-trip breaks all relocations after such a value.
It seems to me from testing that lld does this for every relocatable value, probably intentionally so that loaders / linkers doing have space to write the relocated value without needing to move bytes around.
How do you think this should be fixed?
-
Each
RelocationEntry
could have some reference to the correspondingVar*Int*
(or to the section containing theVar*Int*
. But this crosses section boundaries and be brittle. -
Or, each
Var*Int
could store the width it originally had, and the serializer could artificially expand it to that width. But this breaks theFrom
andInto
impls for these types.
I believe we can just use separate type with custom serialization for such cases
Is it really about original length?
As far as I remember, wabt has a switch like “non canonical lebs” for such cases, and if it set then all lebs will be encoded with it’s maximal length.
Is it really about original length?
Yes. Only relocatable addresses are being emitted as 5-byte values. Non-address constants are being emitted as the smallest width. See res/cases/v1/relocatable.wasm
that I added in my PR - there is (call $console_log (i32.const 0) (i32.const 2))
where the 0
is 5 bytes (address of relocatable global) and the 2
is 1 byte (string length of that global).
I suggest to just add VarInt32NonCan or similar, and let it serialize to full 5 bytes always or remember it's deserialized length (not sure the latter is even required)
It does mean that the const opcodes like I32Const
and the load/store opcodes like I32Load
will have to change to hold VarInt32NonCan
instead of i32
/ u32
Yeah, this is no-go, i thought it was happening in relocation sections only
Another solution will require a lot of work code/global section can be generic over VarInt32/VarInt32NonCan(or "VarInt32FixedLen"), with regular VarInt32 being default
Not sure if it worth troubles
How would that help? Like I said earlier, some opcodes contain smallest width and some opcodes contain largest width. It's not always one or always the other.
I'm thinking of storing a map of original offset -> opcode index in the code section and having the relocation section serialize differently based on how the code section was serialized. Haven't yet written the code to see how feasible it is because it means I need to have a custom Writer
to be able to get the new opcode offsets out from CodeSection::serialize
and then somehow propagate them to RelocSection::serialize
...
something like this: https://gist.github.com/anonymous/da3588dedc6427ec1c49e804a26ec12f#file-playground-rs
but surely it is not worth it
-
Parameterizing the whole
Opcode
means that opcodes likeI32Load(u32, u32)
can't have different widths for their two values. -
Opcodes
needs to hold both type ofOpcode
like I said above so you cannot parameterize it.
You just add two width fields (if both operands actually do require so), and you don't need to hold both types of Opcode, you just use width-preserving variant whenever you need it, and use it for all opcodes (since its abilities to serialize match those of regular variant if the representation is the most compact originally).
I made a prototype of how it would look for individual opcodes to store the original width with the value: https://github.com/Arnavion/parity-wasm/commits/min-width I only did it for I32Const
and Call
, the two opcodes that relocatable.wasm
uses. I don't think it's particularly invasive. Tell me what you think.
That said, I want parity-wasm to be able to relocate modules in the future, which means it needs to be able to update these relocated values arbitrarily. I don't know if I want to depend on lld to always leave enough space like it does now. So I'm still going to try to implement my idea of storing a map in the CodeSection for original offset -> opcode, and changing the offsets in the RelocSection according to how the CodeSection is serialized.