move icon indicating copy to clipboard operation
move copied to clipboard

[format] Add text format for CompiledModule and provide a tool to convert between binary and text bytecode

Open jolestar opened this issue 2 years ago • 21 comments

Motivation

  1. Add text format for CompiledModule and CompiledScript
  2. Provide a tool to convert binary move bytecode to text move bytecode

To easily edit the move bytecode and do some tests. In the future, maybe this text format can be used in the javascript environment.

Usage:

  • move sandbox bytecode-converter -i move-stdlib/build/MoveStdlib/bytecode_modules/ascii.mv
  • move sandbox bytecode-converter -i ascii.json
Screen Shot 2022-09-27 at 23 54 12

(Write your motivation for proposed changes here.)

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

CI

jolestar avatar Sep 27 '22 16:09 jolestar

Love this!

runtian-zhou avatar Sep 27 '22 18:09 runtian-zhou

Could you talk a bit more about the motivation for this tool?

  • If we want to exactly describe a CompiledModule, we can do that in Rust code. Which I find a bit easier to read than the JSON, but I know this might differ per person.
  • If we want an easier to read CompiledModule that might not be correct, we have the IR. And we have a long standing task to add bytecode syntax to the IR (it is already present in the syntax). Extending the IR in this way will let us test the spots not currently testable with the IR, which are linking/dependency errors and out-of-bounds/index errors

tnowacki avatar Sep 27 '22 21:09 tnowacki

Could you talk a bit more about the motivation for this tool?

  • If we want to exactly describe a CompiledModule, we can do that in Rust code. Which I find a bit easier to read than the JSON, but I know this might differ per person.
  • If we want an easier to read CompiledModule that might not be correct, we have the IR. And we have a long standing task to add bytecode syntax to the IR (it is already present in the syntax). Extending the IR in this way will let us test the spots not currently testable with the IR, which are linking/dependency errors and out-of-bounds/index errors

I read some module bytecode with move-disassembler, and want to edit it to do some verifier/VM tests such as removing a special bytecode to test the VM stake balance.

I found that the disassembler defines the bytecode's ToString, but no FromStr.

This PR mainly defines the bytecode's text presentation standard and supports text <-> binary convert.

I think the text standard is useful for the bytecode editor or for displaying bytecode on websites, such as block explorer.

jolestar avatar Sep 27 '22 23:09 jolestar

I read some module bytecode with move-disassembler, and want to edit it to do some verifier/VM tests such as removing a special bytecode to test the VM stake balance.

The stack balance can be tested with the move IR. See: https://github.com/move-language/move/tree/main/language/move-bytecode-verifier/transactional-tests/tests/stack_usage_verifier And for in Rust tests: https://github.com/move-language/move/blob/main/language/move-bytecode-verifier/bytecode-verifier-tests/src/unit_tests/negative_stack_size_tests.rs

tnowacki avatar Sep 28 '22 02:09 tnowacki

I really need this

wow-sven avatar Sep 28 '22 14:09 wow-sven

@jolestar how will it behave if there's more than one visibility modifier? For example:

module beep::boop {
  public(friend) native fun beep();
}

damirka avatar Sep 29 '22 14:09 damirka

@jolestar how will it behave if there's more than one visibility modifier? For example:

module beep::boop {
  public(friend) native fun beep();
}

I give an example:

module std::test {
  public(friend) native fun beep();
}

In the bytecode, public(friend) visibility is friend.

json format bytecode:

{
    "version": 5,
    "self_module_handle_idx": 0,
    "module_handles": [
        {
            "address": 0,
            "name": 0
        }
    ],
    "struct_handles": [],
    "function_handles": [
        {
            "module": 0,
            "name": 1,
            "parameters": 0,
            "return_": 0,
            "type_parameters": []
        }
    ],
    "field_handles": [],
    "friend_decls": [],
    "struct_def_instantiations": [],
    "function_instantiations": [],
    "field_instantiations": [],
    "signatures": [
        []
    ],
    "identifiers": [
        "test",
        "beep"
    ],
    "address_identifiers": [
        "00000000000000000000000000000001"
    ],
    "constant_pool": [],
    "metadata": [],
    "struct_defs": [],
    "function_defs": [
        {
            "function": 0,
            "visibility": "Friend",
            "is_entry": false,
            "acquires_global_resources": [],
            "code": null
        }
    ]
}

jolestar avatar Sep 29 '22 14:09 jolestar

cc @tnowacki

runtian-zhou avatar Oct 07 '22 04:10 runtian-zhou

Another example I could think of is: how to define a module where its self module handle is pointing to a different module?

runtian-zhou avatar Oct 13 '22 17:10 runtian-zhou

Nonetheless, I do not want to add Serialize, Deserialize for the file format types. These types have a hand rolled serialization format as defined in the serializer/deserializer.

I do share the concern here. Do you think that adding the derive for a none standard feature flag only resolve your concern? @tnowacki

runtian-zhou avatar Oct 13 '22 21:10 runtian-zhou

Nonetheless, I do not want to add Serialize, Deserialize for the file format types. These types have a hand rolled serialization format as defined in the serializer/deserializer.

I do share the concern here. Do you think that adding the derive for a none standard feature flag only resolve your concern? @tnowacki

I don't think that would be a great solution as it could negatively affect future work. As such, I think it would be safest that any API, like the one in this PR, needs to not affect the types defined in the file format.

Another example I could think of is: how to define a module where its self module handle is pointing to a different module?

We already do this in Rust tests by just constructing the compiled module in Rust syntax

tnowacki avatar Oct 13 '22 21:10 tnowacki

We already do this in Rust tests by just constructing the compiled module in Rust syntax

Doing this in Rust is hard if we want to construct a complex module.

This tool is useful when I audit the MoveVM verifier logic, I use this tool to try:

  1. Construct two modules with the same layout Struct, and try to mix the two types when using global storage instruction.
  2. Remove and add some instructions, let the stake keep balance, but try to pack/unpack an invalid struct, and so on.

Other use cases:

  1. let basic module struct support serde, we can use the code generator to generate code for other languages.
  2. Display a text format module on the block explorer, developer tools, etc.

In summary, I think we need a human-editable module format.

if we do not want to keep this tool in the Move repo, I can implement it as an independent tool, but can we just keep the ToString and FromStr for Bytecode in the Move repo?

jolestar avatar Oct 14 '22 01:10 jolestar

I really would love to have round trip serialization of the exact CompiledModule. Even though we have debug prints, they are not parsable in Rust. This makes it awkward e.g. to construct modules from fuzzer results.

I'm not sure whats the problem is if we do #[cfg_attr(feature = "with_serde", derive(Serialize, Deserialize))]. Then people can add roundtrip serializers for debugging and developing purposes. This is a different use case than MVIR.

wrwg avatar Oct 17 '22 02:10 wrwg

I really would love to have round trip serialization of the exact CompiledModule. Even though we have debug prints, they are not parsable in Rust. This makes it awkward e.g. to construct modules from fuzzer results.

I'm not sure whats the problem is if we do #[cfg_attr(feature = "with_serde", derive(Serialize, Deserialize))]. Then people can add roundtrip serializers for debugging and developing purposes. This is a different use case than MVIR.

I see. The roundtripping from automatted tools like fuzzing is a very good motivation. And I agree very different than the MVIR.

I'm mostly worried about accidental misuse with serialization. Maybe we can just put it behind a test only flag or something? EDIT: I suppose test only won't work well for certain tools... Any ideas how to make this very painful to accidentally leak into the VM?

tnowacki avatar Oct 17 '22 17:10 tnowacki

I really would love to have round trip serialization of the exact CompiledModule. Even though we have debug prints, they are not parsable in Rust. This makes it awkward e.g. to construct modules from fuzzer results. I'm not sure whats the problem is if we do #[cfg_attr(feature = "with_serde", derive(Serialize, Deserialize))]. Then people can add roundtrip serializers for debugging and developing purposes. This is a different use case than MVIR.

I see. The roundtripping from automatted tools like fuzzing is a very good motivation. And I agree very different than the MVIR.

I'm mostly worried about accidental misuse with serialization. Maybe we can just put it behind a test only flag or something? EDIT: I suppose test only won't work well for certain tools... Any ideas how to make this very painful to accidentally leak into the VM?

Maybe a impl Serialize/Deserialize for CompiledModule and make it just panic and add a test case to make sure it fails?

runtian-zhou avatar Oct 17 '22 19:10 runtian-zhou

We already do this in Rust tests by just constructing the compiled module in Rust syntax

This was a bit hard to work with our transactional test framework. We had a framework to test multiple components including verifier, vm, compiler, etc. However, I found it a bit hard to test verifier/vm with an aribitrary bytecode, at least not as easy as how the rest of the system is tested right now using file based tests. I hope that using this serializer we would be able to write more rigorous tests for our VM/verifier.

runtian-zhou avatar Oct 17 '22 22:10 runtian-zhou

I'm mostly worried about accidental misuse with serialization. Maybe we can just put it behind a test only flag or something? EDIT: I suppose test only won't work well for certain tools... Any ideas how to make this very painful to accidentally leak into the VM?

Maybe use a feature flag like feature = "with_seride_module_experimental", make clear the feature is experimental, and do not maintain compatibility in the future?

Or we make Deserialize as a default feature and let Serialize be flagged by a specific feature. Because the misuse usually is when serialized.

jolestar avatar Oct 18 '22 00:10 jolestar

However, I found it a bit hard to test verifier/vm with an aribitrary bytecode,

This is my main issue with the PR.

If we want a tool for automated tooling round tripping, super, that sounds like a good use case.

If we want arbitrary bytecode, this is not the right format. This can be done already (just not cleanly) with the IR. For example, () + () compiles to Add with nothing else pushed on the stack. The same can be done for (I believe) every other operation. So if you want to really percisely do some funny business on the stack, you can, e.g. (5); x = 0; (&x); *(); () + ();. Obviously this isn't super readable. And we have had a longstanding task to add syntax for the pre-existing support for direct bytecode instructions. And direct declaration of struct/function/module handles. The IR compiler does this, it's just not exposed in syntax. The exception will always be the bounds checks, but those are already easily stressed with proptests and doesn't seem like something we need to be able to test in the transactional tests.

tnowacki avatar Oct 18 '22 18:10 tnowacki

Maybe use a feature flag like feature = "with_seride_module_experimental", make clear the feature is experimental, and do not maintain compatibility in the future?

I don't think this feature needs to be experimental. It seems like a fine thing to have around if you have some automated test results you want to save or edit.
My main concern is accidental usage. But if it is something like CompiledModule::to_json thats fine. If it's something like CompiledModule::ser thats an issue.

tnowacki avatar Oct 18 '22 18:10 tnowacki

My main concern is accidental usage. But if it is something like CompiledModule::to_json thats fine. If it's something like CompiledModule::ser thats an issue.

Ok, I find a way to provide CompiledModule::to_json, and no Serialize on CompiledModule.

jolestar avatar Oct 19 '22 01:10 jolestar

@tnowacki

  • Add CompiledModule::to_json, CompiledModule::from_json, CompiledScript::to_json, CompiledScript::from_json, and CompiledModule, CompiledScript no serde derive.
  • Add the bytecode-converter command to move sandbox.

jolestar avatar Nov 02 '22 15:11 jolestar