zkevm-circuits
zkevm-circuits copied to clipboard
Review usage of geth API calls
- Currently we're using the geth API
debug_traceBlockByHash
to get execution traces (with memory disabled, which we reconstruct on our side) - We're generating a trace of accessed data in the state trie + storage trie on our side currently: https://github.com/privacy-scaling-explorations/zkevm-circuits/blob/93abf023f061c719d8737478b2d87539c517431a/bus-mapping/src/circuit_input_builder/access.rs#L111-L118 . But geth has a preState tracer https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go which we may use instead.
- Geth recently introduced a functionality to run several traces at the same time https://github.com/ethereum/go-ethereum/pull/26086 . That can be useful for us
another issue is to check whether it is possible to fetch 256 block hashes using one graphQL rather than 256 normal rpcs..
Scope: just review it and afterwards create the task needed for implementations.
Discussion:
- Need to check entire bus mapping implementation, all things provided are well formatted and if it's easy to pull the right info.
- Good thing: we already have the parsers.
- Taking random blocks, check if it matches with the process we do, if it matches we decide if we're missing changes or we dont replace it.
- Easy to figure out if we can get 256 block hashes from a single API call.
@ed255 I see that with diff
mode of the prestate tracer we can only get the changed locations in the storage in their final value after execution of the transaction. We wouldn't get information about the read storage that is not modified, or intermediate values if a storage position is overwritten.
Is this enough to prove the execution?
Our current implementation has into account all the accesses to the storage? And this call wouldn't give us that.
https://geth.ethereum.org/docs/developers/evm-tracing/built-in-tracers
@ed255 I see that with
diff
mode of the prestate tracer we can only get the changed locations in the storage in their final value after execution of the transaction. We wouldn't get information about the read storage that is not modified, or intermediate values if a storage position is overwritten.Is this enough to prove the execution?
Our current implementation has into account all the accesses to the storage? And this call wouldn't give us that.
https://geth.ethereum.org/docs/developers/evm-tracing/built-in-tracers
The diff
mode, as you say, won't be useful to replace our gen_state_access_trace
because we also need accounts and storages that are only read and not modified. But the prestate
mode of the prestateTracer
could work, because that ones tracks every part of the state that is touched.
@ed255 sorry, I read this "The prestate mode returns the accounts necessary to execute a given transaction" that seems to suggest that it only return the accounts, but then it says "It reexecutes the given transaction and tracks every part of state that is touched" which seems to suggest that it tracks much more. I will investigate.
This is what the call in question returns (diff mode not enabled) for tx 0xbfeec243916e51f5416dce5c4a4fd4623a13e14ce2e42d9b17a9cac58b2c2c2b: https://pastebin.com/fiDp9MFi using geth v1.10.26
Basically for each account that it touches it returns a key-value pair list of the storage, with every value that has been touched and its pre-state value for the transacion. It doesn't discriminate between writes and reads, and it doesn't keep intermediate steps. Checking the source code confirms this:
This method stores the pre-state for a storage slot:
https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go#L290-L298
And this is called when the opcode is SLOAD or STORE:
https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go#L141-L143
We don't see that the intermediate values or information on which are reads or writes or in which order they are done to be stored in this tracer.
Hence, if I understood the goal correctly, this tracer does not help to produce the same output that the current implementation of gen_state_access_trace
does.
Hence, if I understood the goal correctly, this tracer does not help to produce the same output that the current implementation of
gen_state_access_trace
does.
Ah I just realized I missed a critical information when giving details for this issue.
- Indeed the
gen_state_access_trace
returns a trace with all the state values that are accessed (the trace is in chronological order) - The place where we use
gen_state_access_trace
is here: https://github.com/privacy-scaling-explorations/zkevm-circuits/blob/50d3ad254db7491932a32e2293bf03bde7fe5f8e/bus-mapping/src/circuit_input_builder.rs#L441-L463 - At the end we only care about the AccessSet (not the access order). So I think the correct question would be if the prestate tracer could replace the function
BuilderClient::get_state_accesses
, which returns a set of read/write addresses, read/write storages per address, read/write code of addresses.
@ed255 the pastebin doesn't work, here it is a working link: https://gist.github.com/leolara/d3c463bc80ef930522c2d34f3b116618
At the end we only care about the AccessSet (not the access order). So I think the correct question would be if the prestate tracer could replace the function BuilderClient::get_state_accesses, which returns a set of read/write addresses, read/write storages per address, read/write code of addresses.
It depends on the info we need. We can get:
- Which storage entries are read or written.
- The value before and after the transaction for those entries.
We cannot get:
- Whether the entries have been read or written. Although one might say that if the pre and post value are different, it has been written. Still if the same value is overwritten we couldn't detect that.
Another thing is that it looks it should not be that difficult to implement a Go native tracer in geth that returns what we want or more. The code for the tracer is short.
@ed255 from looking at AcessSet::from, now I am convinced that we could implement this based on the prestate tracer. If we only need the AccessSet.
@ed255 well, not for all opcodes. In our code we are tracing an access for SELFBALANCE, CODECOPY and CODESIZE, but that opcode is not in https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go
We could do a small contribution to add them.
@ed255 well, not for all opcodes. In our code we are tracing an access for SELFBALANCE, CODECOPY and CODESIZE, but that opcode is not in https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go
We could do a small contribution to add them.
Maybe that's not needed. Notice that SELFBALANCE
, CODECOPY
and CODESIZE
read a fields/code of the current account. And the current account is already loaded in the set when the *CALL*
opcode happens. So probably our handling of those opcodes in our gen_state_access_trace
is redudant.
Maybe the only issue I find in the prestate tracer is that it only logs access to account and storage slots; but not to code.
We could get the codes corresponding to all accessed accounts, but that's more than we need. For example, if an account is only accessed via a BALANCE
opcode, we only need the account data but not it's code. With the current BuilderClient implementation, retrieving this unneeded code means two things
- One extra geth call
- Unnecessary used rows in the bytecode circuit
- Unnecessary used rows in the keccak circuit (from the bytecode hash)
@ed255 we are basically doing something very similar to the geth native tracer and the native tracer is not so complicated, so:
- Do we gain much by executing it in the native tracer?
- If so, I think we can extend the native tracer very easily to get the code acesses.
@ed255 we are basically doing something very similar to the geth native tracer and the native tracer is not so complicated, so:
* Do we gain much by executing it in the native tracer?
I think that's a fair question. I don't think we gain much by executing the geth native pre-state tracer instead of our gen_state_access_trace
function. We remove code from our repository, which IMO is always good, but the gen_state_access_trace
function is quite straight forward and not very complex. Another benefit is that I assume that the geth pre-state tracer has been more tested, whereas gen_state_access_trace
not so much (our implementation could contain bugs that we'd need to fix). But currently those two seem like weak points to me.
* If so, I think we can extend the native tracer very easily to get the code acesses.
Yeah, that would be necessary if we want to replace our logic with the geth native pre-state tracer. After this discussion and analysis I'm not sure if it's worth the effort! Certainly this is not a priority and not a critical task at all.
Notes: maybe in the future we would like to have a zkevm tracer, but there's a complication to have to change upstream. We can mark this research as complete