parity-wasm icon indicating copy to clipboard operation
parity-wasm copied to clipboard

Changing widths of integers across round-trips breaks relocations

Open Arnavion opened this issue 6 years ago • 12 comments

I pointed out in https://github.com/paritytech/parity-wasm/pull/196#issuecomment-371278025 that LLVM/lld sometimes compiles integers to larger representations than they need. That particular case was an i32.const 0 compiled as 0x41 0x80 0x80 0x80 0x80 0x00 instead of just 0x41 0x00. parity-wasm can read it just fine, but it serializes it back out as 0x41 0x00 instead of the original representation.

This is a problem for relocations since relocations are measured in byte offsets from the start of the corresponding section. So a round-trip breaks all relocations after such a value.

It seems to me from testing that lld does this for every relocatable value, probably intentionally so that loaders / linkers doing have space to write the relocated value without needing to move bytes around.

How do you think this should be fixed?

  1. Each RelocationEntry could have some reference to the corresponding Var*Int* (or to the section containing the Var*Int*. But this crosses section boundaries and be brittle.

  2. Or, each Var*Int could store the width it originally had, and the serializer could artificially expand it to that width. But this breaks the From and Into impls for these types.

Arnavion avatar Mar 12 '18 02:03 Arnavion

I believe we can just use separate type with custom serialization for such cases

NikVolf avatar Mar 12 '18 07:03 NikVolf

Is it really about original length?

As far as I remember, wabt has a switch like “non canonical lebs” for such cases, and if it set then all lebs will be encoded with it’s maximal length.

pepyakin avatar Mar 12 '18 09:03 pepyakin

Is it really about original length?

Yes. Only relocatable addresses are being emitted as 5-byte values. Non-address constants are being emitted as the smallest width. See res/cases/v1/relocatable.wasm that I added in my PR - there is (call $console_log (i32.const 0) (i32.const 2)) where the 0 is 5 bytes (address of relocatable global) and the 2 is 1 byte (string length of that global).

Arnavion avatar Mar 12 '18 10:03 Arnavion

I suggest to just add VarInt32NonCan or similar, and let it serialize to full 5 bytes always or remember it's deserialized length (not sure the latter is even required)

NikVolf avatar Mar 12 '18 12:03 NikVolf

It does mean that the const opcodes like I32Const and the load/store opcodes like I32Load will have to change to hold VarInt32NonCan instead of i32 / u32

Arnavion avatar Mar 12 '18 18:03 Arnavion

Yeah, this is no-go, i thought it was happening in relocation sections only

NikVolf avatar Mar 12 '18 22:03 NikVolf

Another solution will require a lot of work code/global section can be generic over VarInt32/VarInt32NonCan(or "VarInt32FixedLen"), with regular VarInt32 being default

Not sure if it worth troubles

NikVolf avatar Mar 13 '18 16:03 NikVolf

How would that help? Like I said earlier, some opcodes contain smallest width and some opcodes contain largest width. It's not always one or always the other.

I'm thinking of storing a map of original offset -> opcode index in the code section and having the relocation section serialize differently based on how the code section was serialized. Haven't yet written the code to see how feasible it is because it means I need to have a custom Writer to be able to get the new opcode offsets out from CodeSection::serialize and then somehow propagate them to RelocSection::serialize...

Arnavion avatar Mar 13 '18 17:03 Arnavion

something like this: https://gist.github.com/anonymous/da3588dedc6427ec1c49e804a26ec12f#file-playground-rs

but surely it is not worth it

NikVolf avatar Mar 13 '18 22:03 NikVolf

  • Parameterizing the whole Opcode means that opcodes like I32Load(u32, u32) can't have different widths for their two values.

  • Opcodes needs to hold both type of Opcode like I said above so you cannot parameterize it.

Arnavion avatar Mar 14 '18 02:03 Arnavion

You just add two width fields (if both operands actually do require so), and you don't need to hold both types of Opcode, you just use width-preserving variant whenever you need it, and use it for all opcodes (since its abilities to serialize match those of regular variant if the representation is the most compact originally).

NikVolf avatar Mar 14 '18 20:03 NikVolf

I made a prototype of how it would look for individual opcodes to store the original width with the value: https://github.com/Arnavion/parity-wasm/commits/min-width I only did it for I32Const and Call, the two opcodes that relocatable.wasm uses. I don't think it's particularly invasive. Tell me what you think.

That said, I want parity-wasm to be able to relocate modules in the future, which means it needs to be able to update these relocated values arbitrarily. I don't know if I want to depend on lld to always leave enough space like it does now. So I'm still going to try to implement my idea of storing a map in the CodeSection for original offset -> opcode, and changing the offsets in the RelocSection according to how the CodeSection is serialized.

Arnavion avatar Mar 18 '18 03:03 Arnavion