cairo-vm
cairo-vm copied to clipboard
Redesign hints processing
Our current hint handling has a few issues:
- It exposes the whole VM state, limiting how much of the inner workings of the VM we can change further down the road;
- It forces all the processing to happen during program execution, without the ability to validate/preprocess beforehand;
- Relies too much in
&mut vm, which forces a lot of cloning to make the borrow checker happy; - It's, thus, too unfriendly towards custom hint processors.
This ticket is not about directly solving the issues, as that will probably take several iterations that may be done independently. This is about discussing how to solve them. Any feedback is welcome.
My current proposal is the following:
- Determine what hints actually need. At first sight, they seem to need the following:
- Mutable access to AP;
- Immutable access to FP and PC;
- Mutable access to the VM memory (but in a proxy, as we shouldn't include accesses made by hints into the accessed addresses tracking, according to the Python VM);
- Mutable access to the current hint execution scope;
- The ability to add and remove one level from the scope;
- A way to map referenced IDs to VM memory;
- Access to AP tracking;
- Refactor the current (WIP) trait to only receive what's needed;
- Split the process into compilation (ran during program loading) and execution (ran during VM steps). The trait would now need to implement two or maybe three methods, as the executor is bound to the compiler.
- The compiling step would need to store relevant data in a new custom structure, which represents the hint in some compiled form (maybe Python bytecode for Python executors, an enumerated value for our hardcoded hints or even the plain string as we use it now, lambdas, etc) and its auxiliary data, such as resolved references, PC (as it's already known), AP tracking information, etc.;
- The processing step would receive the object previously created as well as the scope, the FP and AP, etc., and would then appropriately execute the compiled hint.
We are researching integrating Cleopatra with Protostar in order to improve execution times, mostly of test suites.
One thing that we need from VM implementation, is that we have a mechanism of cheatcodes:
- Cheatcodes are regular Python objects, coming from Protostar's user space memory, injected into hints Python VM user space (globals/locals).
- They are more or less implemented similarly to syscalls.
- Injection is performed on entry-point level.
- Theoretically this happens a layer above
CairoRunner, so we should be able to integrate nicely with Cleopatra, but for Stakware's Cairo VM we rely on an ability to swap internal classes with our own subclasses that modify inner logic.
- Theoretically this happens a layer above
- We do not have a requirement that cheatcodes objects must be a Python objects defined in Protostar's user space.
- These could be Rust-defined and some kind of bridge mechanism could do the job.
So for us, it would be nice if the new hints interface either:
- Allowed us full control of the hints Python VM instance used for both compilation and execution of hints.
- Allowed us to swap hints Python VM implementation with a custom one, tailored to our needs.
Of course, I am assuming using real Python implementation for hints execution will be part of the project. Whether this is PyO3 or RustPython is indifferent for us. If needed, we can even roll our own executor.
What do you think?
(For the rest of things, we are tracking them here: https://github.com/software-mansion/protostar/issues/572).
Hello, we are happy to see you have taken interest in the Cairo Rust VM. The purpose of the new hint interface is to have stricter and clearer boundaries between hints and the VM itself, so that hints can only access the parts of the VM that they need, instead of the whole VM. This implementation would mean that the hint processor would receive a proxy (a more limited version of the VM) instead of the VM itself, and wouldn't really fulfil your needs of having full control of the VM instance. Nevertheless, customizing the VM itself is still possible (just not through the hint processor). Please let us know if you need any changes or extra features!
The purpose of the new hint interface is to have stricter and clearer boundaries between hints and the VM itself, so that hints can only access the parts of the VM that they need, instead of the whole VM. This implementation would mean that the hint processor would receive a proxy (a more limited version of the VM) instead of the VM itself, and wouldn't really fulfil your needs of having full control of the VM instance.
Oh, I think I did a big shortcut in my previous message. By VM I meant Python VM which executes hints. Editing previous comment.
For us, what's important is an ability to control the Python environment which hints are executed in.
The feature set of the Cairo VM proxy that you have described should do the job for us. The cheatcode which dives into Cairo VM internals the most (and thus could be the most invasive) is the reflect cheatcode which allows for read-only structured memory access (mind that the API of reflect is funky because of a limitation of cairo-lang VM -- it doesn't give direct access to pointers in memory and instead it dereferences them immediately)
Another interesting cheatcodes are the expect_* family which monitor hints execution for any exceptions; but we have deliberately made these cheatcodes turn on for the whole hint in order to avoid intercepting instruction-by-instruction execution of hint code.
Nevertheless, customizing the VM itself is still possible (just not through the hint processor).
In general, I think that we should be able to do cheatcode→VM customization indirectly, via some kind of message passing if needed, but I believe we can be able to cooperate in order to have things running smoothly via the Proxy struct.
Regarding this redesign, I realized there was a simpler way than my original proposal. I'm not sure it's worth changing now, but just for the record, we could have kept the fields in struct VM as pub (crate) and just write public getters and setters for what we meant to be accessible by hint processors. This way our old code would have still worked, we wouldn't have to keep references inside structures which is making it hard to interact with Python and downstream could just get an opaque VM and call its public methods to interact with it.