miden-vm
miden-vm copied to clipboard
Program analysis tool
It would be cool if we had a way to analyze a given script. This would let us make informed judgements about scripts we may have in the standard library. Some interesting statistics that come to mind:
- Number of VM cycles it takes to execute the script.
- Number of program blocks in the script (for each block type).
- Number of cycles per assembly instruction. E.g., a program executed
u32add.full
operation x times, and this translated to y VM cycles.
Program analysis task list (grjte edit):
- [x] Number of VM cycles it takes to execute a program.
- [x] Number of NOOPs executed as part of a program.
- [x] Total number of cycles required to execute a particular assembly instruction and the average number of cycles per instance of that instruction.
- [ ] Range Checker: Length of range-checker trace (before padding)
- [ ] Range Checker: Number of unique range-checked values [probably a nice-to-have]
- [ ] Hasher: overall length of hasher trace
- [ ] Hasher: Length of hasher trace required for program hashing [might be non-trivial to get]
- [ ] Hasher: Number of rows in the hasher by hash type - e.g., 2-to-1, linear hash, Merkle path verification/update [probably a nice-to-have]
- [ ] Chiplets: Number of memory accesses required.
- [ ] Chiplets: Number of rows in the bitwise chiplet
- [ ] Identify "long pole" of the trace (e.g., stack, range checker, chiplets)
Would love to work on this issue.
I'd imagine rough spec would look like this:
- A csv file with number of cycles per op.
- A function to print a table with these values for all procedures in stdlib
- A function that takes in a script source string as argument and prints a table with the above values.
I'm assuming this should be part of a new crate. And what is the best way to handle source input? As file input or something else?
That would be awesome! I must say though that this is not a simple issue. We should probably break into several PRs and start with something simpler (e.g., a PR which covers the first bullet point from the above).
For the cycle count, a way to do it could be to execute a given script, get a reference to the Process
struct (we'll need to change the public interface for this), and then get the cycle count from there. Another option could be to use execute_iter()
function and then just execute it to the end collecting relevant stats in the process.
@bobbinth I added a simple script to output the total number of vm cycles it takes to execute a miden assembly script. It's a standalone package that takes source script as a file input and calls execute_iter()
function based on your suggestion. Was the intention to have this as a separate crate only to do analysis on singular scripts or did you have something else in mind?
Please let me know if this structure is ok and I'll try to add the rest of the stats in the same PR or we can split it as you suggested.
Thanks
Another question: Are the number of vm cycles required stack input agnostic? I imagine it should be but want to make sure.
According to @bobbinth it would be nice to have number of NOOPs executed as part of a program included to the analysis tool as it would be a good efficiency metric to track.
@tohrnii can you add a comment with a todo list of everything that we've done or talked about for this & check off the things that are done so we can see what's still pending on this issue?
The things we have currently in the program analysis tool.
- [x] Number of VM cycles it takes to execute a program.
- [x] Number of NOOPs executed as part of a program.
- [x] Total number of cycles required to execute a particular assembly instruction and the average number of cycles per instance of that instruction.
Other things that might be interesting to add to the program analysis according to @bobbinth,
- [ ] Number of range checks required.
- [ ] Number of hashes required.
- [ ] Number of memory accesses required.
We could probably go into a bit more detail for these. For example:
- Range check info:
- Length of range-checker trace (before padding)
- Number of unique range-checked values [probably a nice-to-have]
- Hasher info
- Overall length of hasher trace
- Length of hasher trace required for program hashing [might be non-trivial to get]
- Number of rows in the hasher by hash type - e.g., 2-to-1, linear hash, Merkle path verification/update [probably a nice-to-have]
We should also add number of rows in the bitwise chiplet to the list.
Finally, would probably be nice to know which part of the trace was the "long pole" - e.g., stack, range-checker, or chiplets.
Great, thanks @tohrnii @bobbinth ! I updated the issue description with the todo list. Once the current related PR #315 is updated and merged maybe we can prioritize this list (identify which things we want in the next release) and think about splitting some of these into separate issues (perhaps with a new tag)? Thoughts?
Closed by #1099 and others.