zkevm-circuits Review usage of geth API calls

Currently we're using the geth API debug_traceBlockByHash to get execution traces (with memory disabled, which we reconstruct on our side)
We're generating a trace of accessed data in the state trie + storage trie on our side currently: https://github.com/privacy-scaling-explorations/zkevm-circuits/blob/93abf023f061c719d8737478b2d87539c517431a/bus-mapping/src/circuit_input_builder/access.rs#L111-L118 . But geth has a preState tracer https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go which we may use instead.
Geth recently introduced a functionality to run several traces at the same time https://github.com/ethereum/go-ethereum/pull/26086 . That can be useful for us

Nov 08 '22 09:11 ed255

another issue is to check whether it is possible to fetch 256 block hashes using one graphQL rather than 256 normal rpcs..

Nov 16 '22 02:11 lispc

Scope: just review it and afterwards create the task needed for implementations.

Nov 24 '22 12:11 aguzmant103

Discussion:

Need to check entire bus mapping implementation, all things provided are well formatted and if it's easy to pull the right info.
Good thing: we already have the parsers.
Taking random blocks, check if it matches with the process we do, if it matches we decide if we're missing changes or we dont replace it.
Easy to figure out if we can get 256 block hashes from a single API call.

Nov 24 '22 13:11 aguzmant103

@ed255 I see that with diff mode of the prestate tracer we can only get the changed locations in the storage in their final value after execution of the transaction. We wouldn't get information about the read storage that is not modified, or intermediate values if a storage position is overwritten.

Is this enough to prove the execution?

Our current implementation has into account all the accesses to the storage? And this call wouldn't give us that.

https://geth.ethereum.org/docs/developers/evm-tracing/built-in-tracers

Jan 03 '23 04:01 leolara

@ed255 I see that with diff mode of the prestate tracer we can only get the changed locations in the storage in their final value after execution of the transaction. We wouldn't get information about the read storage that is not modified, or intermediate values if a storage position is overwritten.

Is this enough to prove the execution?

Our current implementation has into account all the accesses to the storage? And this call wouldn't give us that.

https://geth.ethereum.org/docs/developers/evm-tracing/built-in-tracers

The diff mode, as you say, won't be useful to replace our gen_state_access_trace because we also need accounts and storages that are only read and not modified. But the prestate mode of the prestateTracer could work, because that ones tracks every part of the state that is touched.

Jan 04 '23 11:01 ed255

@ed255 sorry, I read this "The prestate mode returns the accounts necessary to execute a given transaction" that seems to suggest that it only return the accounts, but then it says "It reexecutes the given transaction and tracks every part of state that is touched" which seems to suggest that it tracks much more. I will investigate.

Jan 05 '23 05:01 leolara

This is what the call in question returns (diff mode not enabled) for tx 0xbfeec243916e51f5416dce5c4a4fd4623a13e14ce2e42d9b17a9cac58b2c2c2b: https://pastebin.com/fiDp9MFi using geth v1.10.26

Jan 19 '23 03:01 leolara

Basically for each account that it touches it returns a key-value pair list of the storage, with every value that has been touched and its pre-state value for the transacion. It doesn't discriminate between writes and reads, and it doesn't keep intermediate steps. Checking the source code confirms this:

This method stores the pre-state for a storage slot:

https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go#L290-L298

And this is called when the opcode is SLOAD or STORE:

https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go#L141-L143

We don't see that the intermediate values or information on which are reads or writes or in which order they are done to be stored in this tracer.

Jan 19 '23 03:01 leolara

Hence, if I understood the goal correctly, this tracer does not help to produce the same output that the current implementation of gen_state_access_trace does.

Jan 19 '23 03:01 leolara

Hence, if I understood the goal correctly, this tracer does not help to produce the same output that the current implementation of gen_state_access_trace does.

Ah I just realized I missed a critical information when giving details for this issue.

Indeed the gen_state_access_trace returns a trace with all the state values that are accessed (the trace is in chronological order)
The place where we use gen_state_access_trace is here: https://github.com/privacy-scaling-explorations/zkevm-circuits/blob/50d3ad254db7491932a32e2293bf03bde7fe5f8e/bus-mapping/src/circuit_input_builder.rs#L441-L463
At the end we only care about the AccessSet (not the access order). So I think the correct question would be if the prestate tracer could replace the function BuilderClient::get_state_accesses, which returns a set of read/write addresses, read/write storages per address, read/write code of addresses.

Jan 23 '23 11:01 ed255

@ed255 the pastebin doesn't work, here it is a working link: https://gist.github.com/leolara/d3c463bc80ef930522c2d34f3b116618

Jan 23 '23 11:01 leolara

At the end we only care about the AccessSet (not the access order). So I think the correct question would be if the prestate tracer could replace the function BuilderClient::get_state_accesses, which returns a set of read/write addresses, read/write storages per address, read/write code of addresses.

It depends on the info we need. We can get:

Which storage entries are read or written.
The value before and after the transaction for those entries.

We cannot get:

Whether the entries have been read or written. Although one might say that if the pre and post value are different, it has been written. Still if the same value is overwritten we couldn't detect that.

Jan 23 '23 12:01 leolara

Another thing is that it looks it should not be that difficult to implement a Go native tracer in geth that returns what we want or more. The code for the tracer is short.

Jan 23 '23 12:01 leolara

@ed255 from looking at AcessSet::from, now I am convinced that we could implement this based on the prestate tracer. If we only need the AccessSet.

Jan 23 '23 12:01 leolara

@ed255 well, not for all opcodes. In our code we are tracing an access for SELFBALANCE, CODECOPY and CODESIZE, but that opcode is not in https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go

We could do a small contribution to add them.

Jan 23 '23 12:01 leolara

@ed255 well, not for all opcodes. In our code we are tracing an access for SELFBALANCE, CODECOPY and CODESIZE, but that opcode is not in https://github.com/ethereum/go-ethereum/blob/0c40df5f28b70b4aafee526bed7575520f3b827b/eth/tracers/native/prestate.go

We could do a small contribution to add them.

Maybe that's not needed. Notice that SELFBALANCE, CODECOPY and CODESIZE read a fields/code of the current account. And the current account is already loaded in the set when the *CALL* opcode happens. So probably our handling of those opcodes in our gen_state_access_trace is redudant.

Maybe the only issue I find in the prestate tracer is that it only logs access to account and storage slots; but not to code. We could get the codes corresponding to all accessed accounts, but that's more than we need. For example, if an account is only accessed via a BALANCE opcode, we only need the account data but not it's code. With the current BuilderClient implementation, retrieving this unneeded code means two things

One extra geth call
Unnecessary used rows in the bytecode circuit
- Unnecessary used rows in the keccak circuit (from the bytecode hash)

Jan 23 '23 15:01 ed255

@ed255 we are basically doing something very similar to the geth native tracer and the native tracer is not so complicated, so:

Do we gain much by executing it in the native tracer?
If so, I think we can extend the native tracer very easily to get the code acesses.

Jan 24 '23 16:01 leolara

@ed255 we are basically doing something very similar to the geth native tracer and the native tracer is not so complicated, so:
* Do we gain much by executing it in the native tracer?

I think that's a fair question. I don't think we gain much by executing the geth native pre-state tracer instead of our gen_state_access_trace function. We remove code from our repository, which IMO is always good, but the gen_state_access_trace function is quite straight forward and not very complex. Another benefit is that I assume that the geth pre-state tracer has been more tested, whereas gen_state_access_trace not so much (our implementation could contain bugs that we'd need to fix). But currently those two seem like weak points to me.

* If so, I think we can extend the native tracer very easily to get the code acesses.

Yeah, that would be necessary if we want to replace our logic with the geth native pre-state tracer. After this discussion and analysis I'm not sure if it's worth the effort! Certainly this is not a priority and not a critical task at all.

Jan 25 '23 15:01 ed255

Notes: maybe in the future we would like to have a zkevm tracer, but there's a complication to have to change upstream. We can mark this research as complete

Jan 26 '23 12:01 aguzmant103

zkevm-circuits zkevm-circuits copied to clipboard

Review usage of geth API calls

zkevm-circuits
zkevm-circuits copied to clipboard