move
move copied to clipboard
[format] Add text format for CompiledModule and provide a tool to convert between binary and text bytecode
Motivation
- Add text format for CompiledModule and CompiledScript
- Provide a tool to convert binary move bytecode to text move bytecode
To easily edit the move bytecode and do some tests. In the future, maybe this text format can be used in the javascript environment.
Usage:
- move sandbox bytecode-converter -i move-stdlib/build/MoveStdlib/bytecode_modules/ascii.mv
- move sandbox bytecode-converter -i ascii.json
data:image/s3,"s3://crabby-images/9762e/9762e50e336c55c26dccf84cd88a2805a090dfe3" alt="Screen Shot 2022-09-27 at 23 54 12"
(Write your motivation for proposed changes here.)
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
CI
Love this!
Could you talk a bit more about the motivation for this tool?
- If we want to exactly describe a CompiledModule, we can do that in Rust code. Which I find a bit easier to read than the JSON, but I know this might differ per person.
- If we want an easier to read CompiledModule that might not be correct, we have the IR. And we have a long standing task to add bytecode syntax to the IR (it is already present in the syntax). Extending the IR in this way will let us test the spots not currently testable with the IR, which are linking/dependency errors and out-of-bounds/index errors
Could you talk a bit more about the motivation for this tool?
- If we want to exactly describe a CompiledModule, we can do that in Rust code. Which I find a bit easier to read than the JSON, but I know this might differ per person.
- If we want an easier to read CompiledModule that might not be correct, we have the IR. And we have a long standing task to add bytecode syntax to the IR (it is already present in the syntax). Extending the IR in this way will let us test the spots not currently testable with the IR, which are linking/dependency errors and out-of-bounds/index errors
I read some module bytecode with move-disassembler, and want to edit it to do some verifier/VM tests such as removing a special bytecode to test the VM stake balance.
I found that the disassembler defines the bytecode's ToString
, but no FromStr
.
This PR mainly defines the bytecode's text presentation standard and supports text <-> binary convert.
I think the text standard is useful for the bytecode editor or for displaying bytecode on websites, such as block explorer.
I read some module bytecode with move-disassembler, and want to edit it to do some verifier/VM tests such as removing a special bytecode to test the VM stake balance.
The stack balance can be tested with the move IR. See: https://github.com/move-language/move/tree/main/language/move-bytecode-verifier/transactional-tests/tests/stack_usage_verifier And for in Rust tests: https://github.com/move-language/move/blob/main/language/move-bytecode-verifier/bytecode-verifier-tests/src/unit_tests/negative_stack_size_tests.rs
I really need this
@jolestar how will it behave if there's more than one visibility modifier? For example:
module beep::boop {
public(friend) native fun beep();
}
@jolestar how will it behave if there's more than one visibility modifier? For example:
module beep::boop { public(friend) native fun beep(); }
I give an example:
module std::test {
public(friend) native fun beep();
}
In the bytecode, public(friend)
visibility is friend
.
json format bytecode:
{
"version": 5,
"self_module_handle_idx": 0,
"module_handles": [
{
"address": 0,
"name": 0
}
],
"struct_handles": [],
"function_handles": [
{
"module": 0,
"name": 1,
"parameters": 0,
"return_": 0,
"type_parameters": []
}
],
"field_handles": [],
"friend_decls": [],
"struct_def_instantiations": [],
"function_instantiations": [],
"field_instantiations": [],
"signatures": [
[]
],
"identifiers": [
"test",
"beep"
],
"address_identifiers": [
"00000000000000000000000000000001"
],
"constant_pool": [],
"metadata": [],
"struct_defs": [],
"function_defs": [
{
"function": 0,
"visibility": "Friend",
"is_entry": false,
"acquires_global_resources": [],
"code": null
}
]
}
cc @tnowacki
Another example I could think of is: how to define a module where its self module handle is pointing to a different module?
Nonetheless, I do not want to add Serialize, Deserialize for the file format types. These types have a hand rolled serialization format as defined in the serializer/deserializer.
I do share the concern here. Do you think that adding the derive for a none standard feature flag only resolve your concern? @tnowacki
Nonetheless, I do not want to add Serialize, Deserialize for the file format types. These types have a hand rolled serialization format as defined in the serializer/deserializer.
I do share the concern here. Do you think that adding the derive for a none standard feature flag only resolve your concern? @tnowacki
I don't think that would be a great solution as it could negatively affect future work. As such, I think it would be safest that any API, like the one in this PR, needs to not affect the types defined in the file format.
Another example I could think of is: how to define a module where its self module handle is pointing to a different module?
We already do this in Rust tests by just constructing the compiled module in Rust syntax
We already do this in Rust tests by just constructing the compiled module in Rust syntax
Doing this in Rust is hard if we want to construct a complex module.
This tool is useful when I audit the MoveVM verifier logic, I use this tool to try:
- Construct two modules with the same layout Struct, and try to mix the two types when using global storage instruction.
- Remove and add some instructions, let the stake keep balance, but try to pack/unpack an invalid struct, and so on.
Other use cases:
- let basic module struct support serde, we can use the code generator to generate code for other languages.
- Display a text format module on the block explorer, developer tools, etc.
In summary, I think we need a human-editable module format.
if we do not want to keep this tool in the Move repo, I can implement it as an independent tool, but can we just keep the ToString
and FromStr
for Bytecode in the Move repo?
I really would love to have round trip serialization of the exact CompiledModule. Even though we have debug prints, they are not parsable in Rust. This makes it awkward e.g. to construct modules from fuzzer results.
I'm not sure whats the problem is if we do #[cfg_attr(feature = "with_serde", derive(Serialize, Deserialize))]
. Then people can add roundtrip serializers for debugging and developing purposes. This is a different use case than MVIR.
I really would love to have round trip serialization of the exact CompiledModule. Even though we have debug prints, they are not parsable in Rust. This makes it awkward e.g. to construct modules from fuzzer results.
I'm not sure whats the problem is if we do
#[cfg_attr(feature = "with_serde", derive(Serialize, Deserialize))]
. Then people can add roundtrip serializers for debugging and developing purposes. This is a different use case than MVIR.
I see. The roundtripping from automatted tools like fuzzing is a very good motivation. And I agree very different than the MVIR.
I'm mostly worried about accidental misuse with serialization. Maybe we can just put it behind a test only flag or something? EDIT: I suppose test only won't work well for certain tools... Any ideas how to make this very painful to accidentally leak into the VM?
I really would love to have round trip serialization of the exact CompiledModule. Even though we have debug prints, they are not parsable in Rust. This makes it awkward e.g. to construct modules from fuzzer results. I'm not sure whats the problem is if we do
#[cfg_attr(feature = "with_serde", derive(Serialize, Deserialize))]
. Then people can add roundtrip serializers for debugging and developing purposes. This is a different use case than MVIR.I see. The roundtripping from automatted tools like fuzzing is a very good motivation. And I agree very different than the MVIR.
I'm mostly worried about accidental misuse with serialization. Maybe we can just put it behind a test only flag or something? EDIT: I suppose test only won't work well for certain tools... Any ideas how to make this very painful to accidentally leak into the VM?
Maybe a impl Serialize/Deserialize for CompiledModule
and make it just panic and add a test case to make sure it fails?
We already do this in Rust tests by just constructing the compiled module in Rust syntax
This was a bit hard to work with our transactional test framework. We had a framework to test multiple components including verifier, vm, compiler, etc. However, I found it a bit hard to test verifier/vm with an aribitrary bytecode, at least not as easy as how the rest of the system is tested right now using file based tests. I hope that using this serializer we would be able to write more rigorous tests for our VM/verifier.
I'm mostly worried about accidental misuse with serialization. Maybe we can just put it behind a test only flag or something? EDIT: I suppose test only won't work well for certain tools... Any ideas how to make this very painful to accidentally leak into the VM?
Maybe use a feature flag like feature = "with_seride_module_experimental"
, make clear the feature is experimental, and do not maintain compatibility in the future?
Or we make Deserialize
as a default feature and let Serialize
be flagged by a specific feature. Because the misuse usually is when serialized.
However, I found it a bit hard to test verifier/vm with an aribitrary bytecode,
This is my main issue with the PR.
If we want a tool for automated tooling round tripping, super, that sounds like a good use case.
If we want arbitrary bytecode, this is not the right format. This can be done already (just not cleanly) with the IR. For example, () + ()
compiles to Add
with nothing else pushed on the stack. The same can be done for (I believe) every other operation. So if you want to really percisely do some funny business on the stack, you can, e.g. (5); x = 0; (&x); *(); () + ();
. Obviously this isn't super readable. And we have had a longstanding task to add syntax for the pre-existing support for direct bytecode instructions. And direct declaration of struct/function/module handles. The IR compiler does this, it's just not exposed in syntax. The exception will always be the bounds checks, but those are already easily stressed with proptests and doesn't seem like something we need to be able to test in the transactional tests.
Maybe use a feature flag like feature = "with_seride_module_experimental", make clear the feature is experimental, and do not maintain compatibility in the future?
I don't think this feature needs to be experimental. It seems like a fine thing to have around if you have some automated test results you want to save or edit.
My main concern is accidental usage. But if it is something like CompiledModule::to_json
thats fine. If it's something like CompiledModule::ser
thats an issue.
My main concern is accidental usage. But if it is something like
CompiledModule::to_json
thats fine. If it's something likeCompiledModule::ser
thats an issue.
Ok, I find a way to provide CompiledModule::to_json
, and no Serialize
on CompiledModule.
@tnowacki
- Add
CompiledModule::to_json
,CompiledModule::from_json
,CompiledScript::to_json
,CompiledScript::from_json
, andCompiledModule
,CompiledScript
no serde derive. - Add the
bytecode-converter
command tomove sandbox
.