zkevm-circuits
zkevm-circuits copied to clipboard
Reuse type deserializers for json and yaml parsing in testool
Currently in testool we have repeated code to parse types holding words, addresses, etc. for json and yaml:
- https://github.com/privacy-scaling-explorations/zkevm-circuits/blob/673f5fe6e9e76ce3796a3f5b1d15d7c49ced66cd/testool/src/statetest/json.rs#L298
- https://github.com/privacy-scaling-explorations/zkevm-circuits/blob/673f5fe6e9e76ce3796a3f5b1d15d7c49ced66cd/testool/src/statetest/yaml.rs#L292
We can unify the parsers by implementing a type that deserializes from a string.
We can also investigate defining a struct with custom types that implement the Deserialize trait so that we don't have to parse each type manually.
Related to https://github.com/privacy-scaling-explorations/zkevm-circuits/pull/748#discussion_r968433533
Good idea, I am working on a PR for this
Hi @maschad, do you have any update on this task?
Hey @ed255 sorry I've been pretty swamped over the last few months, perhaps I should pass this task on to someone else.
Hey @ed255 sorry I've been pretty swamped over the last few months, perhaps I should pass this task on to someone else.
No problem! Thanks for the update, I'll unassign the task for now to signal that no one is currently working on it :)
@ed255 I would like to take this issue. Just to clarify, do we need to create one function that takes in a &str as input and deserializes from a string (we could add a boolean to indicate the type)? We would need to convert from string -> yaml again because the logic requires it in yaml's parse_address(), so how do we convert back to a YAML object?
@ed255 I would like to take this issue. Just to clarify, do we need to create one function that takes in a &str as input and deserializes from a string (we could add a boolean to indicate the type)? We would need to convert from string -> yaml again because the logic requires it in yaml's parse_address(), so how do we convert back to a YAML object?
I think a first step is to have functions that take &str and return a Result<Type> for the following types:
- address
- to_address
- bytes
- code
- calldata
- hash
- u256
- u64
In fact these functions already exist in the testool/src/statetest/json.rs. This functions are generic (not specific to json), so they can be moved to a new file: testool/src/statetest/parse.rs and then they can be imported and used in testool/src/statetest/json.rs and also in testool/src/statetest/yaml.rs. In the yaml implementation, every time a parse_foo function does yaml.as_str() instead of repeating the code, it should just call the parse_type function that now would live in parse.rs.
I don't think we need to go from string back to YAML. I do see that the json parsers only take strings but the yaml parsers can take strings, integers, and other things; so my suggestion to start with this issue is to just reuse the string parsers in the string cases of the yaml parsers.
Example in yaml.rs:
Now:
fn parse_address(yaml: &Yaml) -> Result<Address> {
if let Some(as_str) = yaml.as_str() {
if let Some(hex) = as_str.strip_prefix("0x") {
Ok(Address::from_slice(&hex::decode(hex)?))
} else {
Ok(Address::from_slice(&hex::decode(as_str)?))
}
} else if let Some(as_i64) = yaml.as_i64() {
let hex = format!("{:0>40}", as_i64);
Ok(Address::from_slice(&hex::decode(hex)?))
} else if let Yaml::Real(as_real) = yaml {
Ok(Address::from_str(as_real)?)
} else {
bail!("cannot parse address {:?}", yaml);
}
}
After:
fn parse_address(yaml: &Yaml) -> Result<Address> {
if let Some(as_str) = yaml.as_str() {
parse::parse_address(as_str)
} else if let Some(as_i64) = yaml.as_i64() {
let hex = format!("{:0>40}", as_i64);
Ok(Address::from_slice(&hex::decode(hex)?))
} else if let Yaml::Real(as_real) = yaml {
Ok(Address::from_str(as_real)?)
} else {
bail!("cannot parse address {:?}", yaml);
}
}
@ed255 This is very helpful thanks! address, to_address, bytes, hash, u256, u64 seem doable, but the rest like parse_calldata and parse_code seem to have a few discrepancies between yaml and json version (e.g yaml does not mention asm as tags). Most importantly, they require the instance to use the compiler property. I could just pass in a struct similar/same as JsonStateTestBuilder, or create a new struct in parse.rs that can be somehow integrated with JsonStateTestBuilder and YamlStateTestBuilder. Any thoughts?
PR. Finished address, to_address, bytes, hash, u256, u64 but still pending advice for parse code and calldata, as mentioned above.
@ed255 This is very helpful thanks! address, to_address, bytes, hash, u256, u64 seem doable, but the rest like parse_calldata and parse_code seem to have a few discrepancies between yaml and json version (e.g yaml does not mention asm as tags). Most importantly, they require the instance to use the
compilerproperty. I could just pass in a struct similar/same as JsonStateTestBuilder, or create a new struct in parse.rs that can be somehow integrated with JsonStateTestBuilder and YamlStateTestBuilder. Any thoughts?
About parse_calldata, I noticed they use a decompose_tags function internally which is duplicated in the yaml and json parsers, so that can be deduplicated. And then there are these two chunks that look very similar, and I think could be reworked to reuse code as well:
https://github.com/privacy-scaling-explorations/zkevm-circuits/blob/9400c6237bd3efb9bc979b49fb0f25df7d763853/testool/src/statetest/json.rs#L372-L395
https://github.com/privacy-scaling-explorations/zkevm-circuits/blob/9400c6237bd3efb9bc979b49fb0f25df7d763853/testool/src/statetest/yaml.rs#L344-L371
The parse_code functions look a bit different, but it seems that one parses a superset of the other? I think they could be unified, at least the part that starts with let mut code = if let Some(notag) = tags.get("") {.
Since unifying them would technically change the implementation a bit by parsing more stuff that was previously not parsed, you'd need to do some testing by parsing the ethereum tests to make sure nothing breaks. We don't have extensive ethereum tests being parsed as part of the github CI because they would take a long time, so you will need to test that locally.
About the compiler I think you can just pass a &'a mut Compiler to the generic parser functions.
@ed255 Ok got it. I have pushed changes for parse_calldata and parse_code. A few comments while I was deduplicating. 1) yaml seems to have cleaner/understandable code than json so I used yaml's functions as the base for deduplication. 2) yaml's parse_calldata returns a Label so json will simulate the same behavior. 3) yaml does not check for asm as a valid tag, but I included it as a possibility because json includes it.