cairo-vm
cairo-vm copied to clipboard
Feature: deserialize CairoPie from ZIP archives
Feature: deserialize CairoPie from ZIP archives
Description
Problem: the Python VM generates Cairo PIEs as ZIP archives containing several JSON files and the memory as a binary file. We do not have a solution yet to deserialize these files into CairoPie objects.
Solution: add a new CairoPie::from_file(path) method that reads the ZIP file and extracts its contents.
Checklist
- [ ] Linked to Github Issue
- [x] Unit tests added
- [x] Integration tests added.
- [ ] This change requires new documentation.
- [ ] Documentation has been added/updated.
- [ ] CHANGELOG has been updated.
Hi @odesenfans ! I don't get, why we need this Why you should need to read the files generated by the Python vm?
Hi @pefontana ! We’re currently working on porting the Starknet bootloader on the Rust VM, and a part of our test suite is to run PIEs generated with the Python VM, including Starknet OS PIEs. Furthermore, there’s no other method to load PIEs from files afaik.
If it's for the test suite, can't a test helper handle the unzipping and loading? I find specially odd the fact that we load the memory dump when the VM is supposed to be producing it. Besides that, zip files need to be handled with care in general due to the risk of zip bombs.
@Oppen :
-
The Python VM produces a ZIP file, so IMO the loader needs to take a ZIP file as input. Sure, it's a problem, but I guess we have to live with that design choice.
-
The bootloader needs the memory of the PIE when reexecuting it, I guess to detect differences between the initial execution and the one on top of the bootloader. That's just a guess though. What I'm sure of is that I need to relocate the PIE memory in the bootloader hints, so I need to load it from the ZIP file.
- The Python VM produces a ZIP file, so IMO the loader needs to take a ZIP file as input. Sure, it's a problem, but I guess we have to live with that design choice.
I'm not sure that's necessarily true. Requirements and design choices can change, specially in reimplementations.
2. The bootloader needs the memory of the PIE when reexecuting it, I guess to detect differences between the initial execution and the one on top of the bootloader. That's just a guess though. What I'm sure of is that I need to relocate the PIE memory in the bootloader hints, so I need to load it from the ZIP file.
I see. Still, I think the ZIP handling can be done downstream by the library user, as long as you have a mechanism to pass the memory to the bootloader hints.
Related, does this mean non-determinism must always produce deterministic values? Otherwise, re-executing with a different VM may lead to unfair rejection if it looks for differences. And if we go by assuming you use the same VM, then what Python does here does not matter. What am I missing?
On a different issue: please don't mix code relocation with a feature PR. It makes it harder to review. Specifically I mean the code moved from deserialize_program.rs to deserialize_utils.rs.
Thanks for the comment.
Regarding the ZIP file, my point is mostly "There are Cairo PIEs generated as ZIP files in the Cairo ecosystem so the CairoPie struct should have a compatible loader". We can document that it should only be used with sanitized inputs, hide it behind a feature flag (ex: python-vm-compat), but I think it makes sense to have it here.
For the rest, indeed our use case is a temporary one (we will use PIEs generated with the Rust VM eventually), but "temporary" is always such a vague term that I believe there is merit in stabilizing this. The bootloader is a deterministic program, we're not yet there but I don't see any reason why it should generate a different output on different VMs as long as the hints have the same side effects. We're still working on that.
As for moving the code, I'll move the code relocation to a different branch then.
Regarding the ZIP file, my point is mostly "There are Cairo PIEs generated as ZIP files in the Cairo ecosystem so the
CairoPiestruct should have a compatible loader". We can document that it should only be used with sanitized inputs, hide it behind a feature flag (ex:python-vm-compat), but I think it makes sense to have it here.
I guess it may make sense. Let's get @pefontana and @lferrigno in the loop to see if we reach some agreement.
For the rest, indeed our use case is a temporary one (we will use PIEs generated with the Rust VM eventually), but "temporary" is always such a vague term that I believe there is merit in stabilizing this. The bootloader is a deterministic program, we're not yet there but I don't see any reason why it should generate a different output on different VMs as long as the hints have the same side effects. We're still working on that.
But the result of running the bootloader is the result of running a block of transactions that may or may not be deterministic, right? I mean, nondeterminism is a feature in Cairo, so is it still correct to assume two executions of possibly different VMs will always produce the same traces? Maybe we can consider it best effort and put some kind of disclaimer about it?
As for moving the code, I'll move the code relocation to a different branch then.
Thanks!
But the result of running the bootloader is the result of running a block of transactions that may or may not be deterministic, right? I mean, nondeterminism is a feature in Cairo, so is it still correct to assume two executions of possibly different VMs will always produce the same traces? Maybe we can consider it best effort and put some kind of disclaimer about it?
Yes. I need to ask more details about this, but anyway being able to check deterministic programs is already pretty good.
I moved the code back and fixed rebase conflicts. Waiting for additional reviews then :)
Hi @odesenfans ! Thanks for the PR, sorry for the late response, some people of the team were at holidays I think it is okay to merged it, just:
- Can you add a little documentation in the CairoPie::from_file(ZIP) method, documenting that it is used for bootloader execution
- Also, some CI tests are failing, can you fix them? If you have any trouble we can help you with this
Fixed it, the CI passes on our fork. I did have to add quite a few cfg(feature = "std") though (for wasm/no-std builds), and I'm wondering whether adding a distinct feature flag for this would make sense. Let me know.
Codecov Report
Attention: Patch coverage is 95.86957% with 19 lines in your changes are missing coverage. Please review.
Project coverage is 96.60%. Comparing base (
c692b75) to head (c10101b). Report is 1 commits behind head on main.
:exclamation: Current head c10101b differs from pull request most recent head aa646b2. Consider uploading reports for the commit aa646b2 to get more accurate results
| Files | Patch % | Lines |
|---|---|---|
| vm/src/vm/runners/cairo_pie.rs | 95.58% | 18 Missing :warning: |
| vm/src/serde/deserialize_program.rs | 97.72% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1533 +/- ##
==========================================
+ Coverage 96.56% 96.60% +0.04%
==========================================
Files 96 95 -1
Lines 38494 38798 +304
==========================================
+ Hits 37170 37482 +312
+ Misses 1324 1316 -8
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Rebased on top of main + added a from_bytes() method, would appreciate it if someone could approve the CI run.
Great @odesenfans ! Can you add a line to the Changelog.md describing the feature added? Like here https://github.com/lambdaclass/cairo-vm/pull/1559/files#diff-06572a96a58dc510037d5efa622f9bec8519bc1beab13c9f251e97e657a9d4ed
And I think we need another rebase with main and we will be good to go!
@fmoletta thanks for the review. Regarding your different points:
-
For the prime value, I agree that it makes sense checking that it matches the Cairo prime and return an error if it does not. I'll update this.
-
Regarding the builtins, I found this issue down the line and already have a fix for it. I was planning to put it in a second PR if that is okay for you.
I addressed your comments @fmoletta , let me know what you prefer regarding the signature builtin deserialization (cf. my comment above).
@fmoletta thanks for the review. Regarding your different points:
- For the prime value, I agree that it makes sense checking that it matches the Cairo prime and return an error if it does not. I'll update this.
- Regarding the builtins, I found this issue down the line and already have a fix for it. I was planning to put it in a second PR if that is okay for you.
Hi @odesenfans ! Can you make a PR with the BuitlinAdditionalData fix, so we merge it with this PR?
Hi @pefontana , I added my fixes on top of this PR. It required two commits instead of one. The first one fixes a problem with serde's untagged enum deserialization that fails for hashmaps that do not have strings as keys. The second one makes it possible to deserialize signature builtin data in the Python VM format.
Answered all comments and rebased. Sorry for the delay, I was busy with something else.
Rebased this and applied changes to fix WASM builds now that I understand a bit better how the crate::stdlib module is supposed to be used.
looks like some checks are broken, can you take a look @odesenfans
@juanbono I just pushed a fix :crossed_fingers:
I pushed a fix for the latest CI failure, which was complaining about null values missing from additional_data.json in the PIEs generated with this PR. The problem is simple: this PR introduces the use of CairoPieAdditionalData for the additional_data field instead of a hashmap. This makes it harder to reproduce the behavior of cairo-lang that will output a null for each builtin used by the program and nothing for the others. Implementing the same behaviour in Rust would require tracking the builtins used by the program directly in CairoPieAdditionalData with a non-serialized field and output nulls accordingly.
I think this added complexity makes little sense so I modified the PIE comparison script to just ignore null values from the comparison. Let me know what you think.
I pushed a fix for the latest CI failure, which was complaining about
nullvalues missing fromadditional_data.jsonin the PIEs generated with this PR. The problem is simple: this PR introduces the use ofCairoPieAdditionalDatafor theadditional_datafield instead of a hashmap. This makes it harder to reproduce the behavior ofcairo-langthat will output a null for each builtin used by the program and nothing for the others. Implementing the same behaviour in Rust would require tracking the builtins used by the program directly inCairoPieAdditionalDatawith a non-serialized field and output nulls accordingly.I think this added complexity makes little sense so I modified the PIE comparison script to just ignore null values from the comparison. Let me know what you think.
Outputting the same serialized cairo pies as cairo_lang is vital for the cairo-vm. I don't think a PR adding Deserialization should disturb this, is there another way to fix this?
Outputting the same serialized cairo pies as cairo_lang is vital for the cairo-vm. I don't think a PR adding Deserialization should disturb this, is there another way to fix this?
I think there is, although it's a bit more work. I just wanted to check with you if it was worth the extra effort for null values. I underline that the only difference is that null values are omitted from additional_data.json, while they are included in cairo-lang for all builtins used by the program. The rest is unchanged. If you think that these null values should still be there then I'll find a way to avoid changing that.
@fmoletta I fixed the issue. The solution I chose was to use an intermediate transformation into a hashmap when creating the ZIP archive. Null values are added there as needed. Let me know if this works for you.
This solution involves converting the HashMap representation into this new CairoPieAdditonalData struct and then back to a HashMap in order to serialize, this seems quite redundant, can remove the intermediary?
This solution involves converting the HashMap representation into this new CairoPieAdditonalData struct and then back to a HashMap in order to serialize, this seems quite redundant, can remove the intermediary?
It is definitely redundant. The problem is that I need the CairoPieAdditionalData struct when deserializing; deserialization was broken because deserialization as an untagged enum just would not work. One solution could be to keep the hashmap in CairoPie and use CairoPieAdditionalData only when deserializing. But IMO having a typed struct instead of a hashmap in CairoPie is better.
Another possibility is to implement FromIterator on CairoPieAdditionalData to avoid the first hashmap conversion.
What do you think? Any suggestion?
Thank you for your work! But we decided to merge #1729 which solves the issue. Feel free to open an issue/PR if there is something about this solution that doesn't work for you!