go-ethereum
go-ethereum copied to clipboard
Lag between tx inclusion and accessible state over RPC
I'm sometimes seeing a lag in updating of the state as of geth v1.15.8. I am sending a transaction that is validated and included in a block, I wait for its receipt and retrieve it, at this point I can get the block at latest and it returns to me the block with my transaction in it. But if I make these calls fast enough and add a call to eth_call or eth_getCode with the latest tag, it does not return to me the same state. If I send a successfully delegated 7702 transaction for example, and I wait for the receipt and then check eth_getCode, I will sometimes get 0x as my code. If I put this call on a quick loop it returns the correct code quickly thereafter.
If I change the block identifier in the eth_call or eth_getCode to be the blockhash from the transaction receipt, it then works every time. But latest should be the same state as that block.
I went back a few versions and noticed this change happened between versions v1.15.7 (no issues) and v1.15.8.
from @fselmo
--
I'm trying to repro (branch here), but no success yet.
For context, this is the test that picked this up.
This is the PR that fixes this issue for us, using the blockNumber from the receipt instead of "latest".
I played around with a script for a while and could not reproduce either until I passed in even just an empty datadir - then I was able to get a reproducible script going. The datadir seems to be an important piece. The script starts up geth each run on a new port and so can be run concurrently to see failures more easily.
I silenced any other errors other than the eth_getCode errors for now but they also seem possibly related to state. Some are insufficient funds for the acct which lead me to believe that the funding tx is going through a similar state lag before we send the 7702 tx.
import asyncio
import os
import re
import select
import subprocess
import tempfile
import time
from web3 import (
AsyncWeb3,
WebSocketProvider,
)
GETH_BIN = os.getenv("GETH_BINARY")
GREEN_DOT = "\033[32m.\033[0m" # success
RED_X = "\033[31mx\033[0m" # expected failure case
YELLOW_S = "\033[33ms\033[0m" # unexpected failure, skip
class UnexpectedCode(Exception):
pass
def get_geth_port(proc, timeout=20):
start = time.time()
pattern = re.compile(r"127\.0\.0\.1:(\d+)")
while time.time() - start < timeout:
rlist, _, _ = select.select([proc.stderr], [], [], 0.1)
if not rlist:
continue
line = proc.stderr.readline()
if not line:
continue
match = pattern.search(line)
if match:
port = int(match.group(1))
if port not in {0, 80}:
return port
raise TimeoutError("Timeout waiting for port")
def cleanup(proc, tmp_dir):
proc.terminate()
try:
proc.wait(timeout=5)
if proc.stdout:
proc.stdout.close()
if proc.stderr:
proc.stderr.close()
tmp_dir.cleanup()
except subprocess.TimeoutExpired:
proc.kill()
proc.wait()
async def main():
tmp_dir = tempfile.TemporaryDirectory()
datadir = os.path.join(tmp_dir.name, "datadir")
os.makedirs(datadir, exist_ok=True)
proc = subprocess.Popen(
[
GETH_BIN,
"--datadir", # passing in a `datadir` is required to reproduce
datadir,
"--dev",
"--dev.period",
"3",
"--ws",
"--ws.port",
"0", # let geth find an open port, then parse and connect
"--ws.api",
"admin,eth,web3,debug",
],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
try:
port = get_geth_port(proc)
except TimeoutError:
# didn't parse port (if timeout too low), just skip this run
print(YELLOW_S, end="", flush=True)
cleanup(proc, tmp_dir)
return
async with AsyncWeb3(WebSocketProvider(f"ws://127.0.0.1:{port}")) as w3:
try:
accts = await w3.eth.accounts
keyfile_acct = accts[0]
# fund new acct
acct = w3.eth.account.create()
tx_hash = await w3.eth.send_transaction(
{"from": keyfile_acct, "to": acct.address, "value": 10**22}
)
await w3.eth.wait_for_transaction_receipt(tx_hash)
# send 7702 tx
chain_id = await w3.eth.chain_id
nonce = await w3.eth.get_transaction_count(acct.address)
tx = {
"chainId": chain_id,
"nonce": nonce,
"gas": 200_000,
"maxFeePerGas": 10**11,
"maxPriorityFeePerGas": 10**11,
"to": acct.address,
"value": 0,
"accessList": [],
"authorizationList": [
acct.sign_authorization(
{
"address": "0xdeadbeef00000000000000000000000000000000",
"chainId": chain_id,
"nonce": nonce + 1,
}
)
],
"data": "0x",
}
signed = acct.sign_transaction(tx)
tx_hash = await w3.eth.send_raw_transaction(signed.raw_transaction)
await w3.eth.wait_for_transaction_receipt(tx_hash)
code = await w3.eth.get_code(acct.address) # the default block id is "latest"
if not code.hex().startswith("ef0100deadbeef"):
raise UnexpectedCode("Code was not set!")
# clear code
clear_tx = dict(tx) # copy tx and replace relevant fields
clear_tx["nonce"] = nonce + 2
clear_tx["authorizationList"] = [
acct.sign_authorization(
{
"chainId": chain_id,
"address": f"0x{'00' * 20}",
"nonce": nonce + 3,
}
)
]
signed_reset = acct.sign_transaction(clear_tx)
clear_tx_hash = await w3.eth.send_raw_transaction(
signed_reset.raw_transaction
)
await w3.eth.wait_for_transaction_receipt(clear_tx_hash)
reset_code = await w3.eth.get_code(acct.address)
if reset_code.hex() != "":
raise UnexpectedCode("Code was not cleared!")
print(GREEN_DOT, end="", flush=True)
except UnexpectedCode as e:
print(RED_X, end="", flush=True)
except Exception as e:
# skip other exceptions... these are much rarer + also seem related to
# state (e.g. not funding the acct in time)
print(YELLOW_S, end="", flush=True)
cleanup(proc, tmp_dir)
if __name__ == "__main__":
asyncio.run(main())
The following command will run the flow 80 times within 10 concurrent processes at any given time, which isn't a ton of load, but can adjust those values if need be.
$ pip install -U web3
$ seq 80 | xargs -n1 -P10 python /path/to/script/above
I get fairly consistent failures with this.
just encounter this as well, seems related to sync write option change
seems related to sync write option change
I can confirm that reverting that commit locally and running the above script fixes the issue
Actually while I'm here, I tested current master and it seems to have this fixed.
So as long as it is known where the problem was and maybe there is a test for this to make sure it doesn't come back, this is probably safe to close @lightclient
Actually while I'm here, I tested current master and it seems to have this fixed.
Not sure how intentional the change was toward this problem (definitely seems like it is related) but this is the PR that fixed it.
@rjl493456442 any thoughts on this?
Hey all, sorry for the chaos and thanks for catching it.
But as @fselmo said, the Pebble Syncmode write has already been replaced in https://github.com/ethereum/go-ethereum/pull/31703. Please use the latest master or wait a bit for the next release.
would like to work on this!
I can confirm this is now fixed in v1.16.0. Feel free to close this issue.