erigon
erigon copied to clipboard
sporadic failure to retrieve transactions by hash on historical blocks
System information
Erigon version: ./erigon --version
FAILING VERSION: v2022.07.03 (and I am pretty sure v2022.07.02. also)
WORKING VERSION: v2022.04.05 (last 'beta')
Expected behaviour
Passing a historical txhash to eth_getTransactionByHash or eth_getTransactionReceipt RPC correctly returns the requested transaction.
Actual behaviour
Intermittently, a result of 'null' is returned in RPC respond instead of the transaction header. This persists for minutes to hours for some particular txhash, but eventually returns to the expected behavior again.
Steps to reproduce the behaviour
It is easily reproduced with the below python script (and also does not reproduce on 2022.04.05). Notably, on failing txhashes, the transaction can still be retrieved by using eth_getTransactionByBlockNumberAndIndex. So there seems to be some issue pertaining to retrieval by hash.
Python code:
from web3 import Web3
from random import randint, choice
w3 = Web3(Web3.HTTPProvider('http://localhost:8545'))
now = w3.eth.blockNumber
while True:
old = now - randint(0, 1000000)
block = w3.eth.get_block(old)
txs =[t.hex() for t in block['transactions']]
if len(txs) > 0:
txhash = txs[0] # pick first tx in a random old block
resp = w3.provider.make_request('eth_getTransactionByHash', [txhash])
if resp['result'] is None: # if result == null, we hit the bug
print(old, now-old, txhash)
print(resp) # <--- this has a result of 'null' in JSONRPC response
resp = w3.provider.make_request('eth_getTransactionByBlockNumberAndIndex', [hex(old), hex(0)])
print(resp) # <--- this returns correct result!
# commented out to reduce spam, but all txs in a give block seem to be affected
# print('whole block:')
# for i, txhash in enumerate(txs): # now check all txs in the offending block
# resp = w3.provider.make_request('eth_getTransactionByHash', [txhash])
# print(i, resp)
# resp = w3.provider.make_request('eth_getTransactionByBlockNumberAndIndex', [hex(old), hex(i)])
# print(i, resp)
I guess a fairly obvious question is whether there is some kind of corruption in my db. Would be good to see if this can reproduced on another erigon instance (I only have one to test).
I think it rather points to the index of txhash => tx number being unreliable
My random sampling seems to have converged to a contiguous hole from block 15140000 to 15141999... exactly 2000 blocks. Does this ring any bells w.r.t. the internal structures?
I am wondering if it will move around at some point (since observation seems to be that the behavior should "heal" itself eventually).
My random sampling seems to have converged to a contiguous hole from block 15140000 to 15141999... exactly 2000 blocks. Does this ring any bells w.r.t. the internal structures?
I am wondering if it will move around at some point (since observation seems to be that the behavior should "heal" itself eventually).
This looks related to snapshots. Something like: txs was moved to snapshot, but erigon can’t find them. Show us some Erigon startup logs, some erigon recent logs. Also check if erigon restart help or not.
Update, 8 hours later, the contiguous "bad" range spans now from block 15140000 to 15143999 (+2000 blocks added).
Note this is still statistical sampling... so its possible the numbers above not exact. But seems like a strong signal.
Here is log from most recent restart:
Jul 25 21:10:11 slc04 ERIGON[606667]: [INFO] [07-25|21:10:11.421] Starting metrics server addr=http://0.0.0.0:6060/debug/metrics/prometheus
Jul 25 21:10:11 slc04 ERIGON[606667]: [INFO] [07-25|21:10:11.421] Build info git_branch=v2022.07.03-bgelb git_tag= git_commit=2f8f539daac12bad029963157d03495e81c58bb7
Jul 25 21:10:11 slc04 ERIGON[606667]: [INFO] [07-25|21:10:11.421] Starting Erigon on Ethereum mainnet...
Jul 25 21:10:11 slc04 ERIGON[606667]: [INFO] [07-25|21:10:11.422] Maximum peer count ETH=100 total=100
Jul 25 21:10:11 slc04 ERIGON[606667]: [INFO] [07-25|21:10:11.422] torrent verbosity level=2
Jul 25 21:10:13 slc04 ERIGON[606667]: [INFO] [07-25|21:10:13.525] Set global gas cap cap=50000000
Jul 25 21:10:13 slc04 ERIGON[606667]: [INFO] [07-25|21:10:13.577] Opening Database label=chaindata path=/eth/erigon/chaindata
Jul 25 21:10:13 slc04 ERIGON[606667]: [INFO] [07-25|21:10:13.581] Initialised chain configuration config="{ChainID: 1, Homestead: 1150000, DAO: 1920000, DAO Support: true, Tangerine Whistle: 2463000, Spurious Dragon: 2675000, Byzantium: 4370000, Constantinople: 7280000, Petersburg: 7280000, Istanbul: 9069000, Muir Glacier: 9200000, Berlin: 12244000, London: 12965000, Arrow Glacier: 13773000, Gray Glacier: 15050000, Terminal Total Difficulty: <nil>, Merge Netsplit: <nil>, Engine: ethash}" genesis=0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
Jul 25 21:10:13 slc04 ERIGON[606667]: [INFO] [07-25|21:10:13.581] Initialising Ethereum protocol network=1
Jul 25 21:10:13 slc04 ERIGON[606667]: [INFO] [07-25|21:10:13.581] Using snapshots on=true
Jul 25 21:10:13 slc04 ERIGON[606667]: [INFO] [07-25|21:10:13.581] Effective prune_flags= snapshot_flags="--snapshots=true"
Jul 25 21:10:17 slc04 ERIGON[606667]: [INFO] [07-25|21:10:17.971] Disk storage enabled for ethash DAGs dir=/eth/erigon/ethash-dags count=2
Jul 25 21:10:18 slc04 ERIGON[606667]: [INFO] [07-25|21:10:18.646] Starting private RPC server on=127.0.0.1:9090
Jul 25 21:10:18 slc04 ERIGON[606667]: [INFO] [07-25|21:10:18.646] new subscription to logs established
Jul 25 21:10:18 slc04 ERIGON[606667]: [INFO] [07-25|21:10:18.650] Started P2P networking version=66 self=enode://f94c1e911fe1805cb637925710b5bcfc6b93653a087997a93f1ec3d90b629419fe0cc8f934ddf2d38d17010e4cf8d9bfda516c5db5ccc635149ef41630d7d36e@127.0.0.1:30303 name=erigon/v2022.07.3-alpha-2f8f539d/linux-amd64/go1.18
Jul 25 21:10:18 slc04 ERIGON[606667]: [INFO] [07-25|21:10:18.660] [Snapshots] Fetching torrent files metadata
Jul 25 21:10:18 slc04 ERIGON[606667]: [INFO] [07-25|21:10:18.701] New txs subscriber joined
Jul 25 21:10:18 slc04 ERIGON[606667]: [INFO] [07-25|21:10:18.701] new subscription to newHeaders established
Jul 25 21:10:20 slc04 ERIGON[606667]: [INFO] [07-25|21:10:20.096] [txpool] Started
Jul 25 21:10:21 slc04 ERIGON[606667]: [INFO] [07-25|21:10:21.730] [Snapshots] Stat blocks=15124999 segments=15124999 indices=15124999 alloc=2.6GB sys=4.6GB
Jul 25 21:10:21 slc04 ERIGON[606667]: [INFO] [07-25|21:10:21.730] [1/16 Headers] Waiting for headers... from=15215887
Jul 25 21:10:24 slc04 ERIGON[606667]: [INFO] [07-25|21:10:24.736] Generating ethash verification cache epoch=507 percentage=96 elapsed=3.000s
Jul 25 21:10:25 slc04 ERIGON[606667]: [INFO] [07-25|21:10:25.578] [1/16 Headers] Processed highest inserted=15215889 age=9s
Jul 25 21:10:26 slc04 ERIGON[606667]: [INFO] [07-25|21:10:26.286] [6/16 Execution] Completed on block=15215889
Jul 25 21:10:27 slc04 ERIGON[606667]: [INFO] [07-25|21:10:27.167] [13/16 LogIndex] Running from blocks 15215888 to 15215889 endBlock=15215889
Jul 25 21:10:27 slc04 ERIGON[606667]: [INFO] [07-25|21:10:27.266] Timings (slower than 50ms) Headers=6.92s Senders=63ms Execution=632ms HashState=180ms IntermediateHashes=337ms CallTraces=167ms AccountHistoryIndex=162ms TxLookup=63ms
Jul 25 21:10:27 slc04 ERIGON[606667]: [INFO] [07-25|21:10:27.266] RPC Daemon notified of new headers from=15215887 to=15215889 header sending=19.14µs log sending=260ns
Jul 25 21:10:27 slc04 ERIGON[606667]: [INFO] [07-25|21:10:27.266] [1/16 Headers] Waiting for headers... from=15215889
Jul 25 21:10:27 slc04 ERIGON[606667]: [INFO] [07-25|21:10:27.845] Generating ethash verification cache epoch=508 percentage=80 elapsed=3.000s
Jul 25 21:10:32 slc04 ERIGON[606667]: [INFO] [07-25|21:10:32.915] [1/16 Headers] Processed highest inserted=15215890 age=7s
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.023] [6/16 Execution] Completed on block=15215890
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.471] [13/16 LogIndex] Running from blocks 15215890 to 15215890 endBlock=15215890
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.495] Timings (slower than 50ms) Headers=5.649s Execution=55ms HashState=75ms IntermediateHashes=162ms CallTraces=109ms AccountHistoryIndex=86ms
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.495] Tables PlainState=70.3GB AccountChangeSet=157.5GB StorageChangeSet=230.9GB BlockTransaction=8.4GB TransactionLog=422.5GB FreeList=400.0KB ReclaimableSpace=400.0MB
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.841] Commit cycle in=346.599906ms
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.842] RPC Daemon notified of new headers from=15215889 to=15215890 header sending=29.791µs log sending=310ns
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.842] [1/16 Headers] Waiting for headers... from=15215890
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.854] [1/16 Headers] Processed highest inserted=15215891 age=0
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.860] [6/16 Execution] Completed on block=15215891
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.862] [13/16 LogIndex] Running from blocks 15215891 to 15215891 endBlock=15215891
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.864] Commit cycle in=1.051021ms
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.864] RPC Daemon notified of new headers from=15215890 to=15215891 header sending=8.96µs log sending=140ns
Jul 25 21:10:33 slc04 ERIGON[606667]: [INFO] [07-25|21:10:33.864] [1/16 Headers] Waiting for headers... from=15215891
Jul 25 21:10:47 slc04 ERIGON[606667]: [INFO] [07-25|21:10:47.868] [1/16 Headers] Processed highest inserted=15215892 age=13s
Jul 25 21:10:47 slc04 ERIGON[606667]: [INFO] [07-25|21:10:47.956] [6/16 Execution] Completed on block=15215892
Jul 25 21:10:48 slc04 ERIGON[606667]: [INFO] [07-25|21:10:48.383] [13/16 LogIndex] Running from blocks 15215892 to 15215892 endBlock=15215892
Jul 25 21:10:48 slc04 ERIGON[606667]: [INFO] [07-25|21:10:48.413] Timings (slower than 50ms) Headers=14.004s HashState=69ms IntermediateHashes=156ms CallTraces=99ms AccountHistoryIndex=89ms
Jul 25 21:10:48 slc04 ERIGON[606667]: [INFO] [07-25|21:10:48.413] Tables PlainState=70.3GB AccountChangeSet=157.5GB StorageChangeSet=230.9GB BlockTransaction=8.4GB TransactionLog=422.5GB FreeList=408.0KB ReclaimableSpace=408.0MB
Will try restart next...
Log after restart:
Jul 28 20:31:43 slc04 ERIGON[383187]: [INFO] [07-28|20:31:43.137] Starting metrics server addr=http://0.0.0.0:6060/debug/metrics/prometheus
Jul 28 20:31:43 slc04 ERIGON[383187]: [INFO] [07-28|20:31:43.137] Build info git_branch=v2022.07.03-bgelb git_tag= git_commit=2f8f539daac12bad029963157d03495e81c58bb7
Jul 28 20:31:43 slc04 ERIGON[383187]: [INFO] [07-28|20:31:43.138] Starting Erigon on Ethereum mainnet...
Jul 28 20:31:43 slc04 ERIGON[383187]: [INFO] [07-28|20:31:43.140] Maximum peer count ETH=100 total=100
Jul 28 20:31:43 slc04 ERIGON[383187]: [INFO] [07-28|20:31:43.140] torrent verbosity level=2
Jul 28 20:31:45 slc04 ERIGON[383187]: [INFO] [07-28|20:31:45.244] Set global gas cap cap=50000000
Jul 28 20:31:45 slc04 ERIGON[383187]: [INFO] [07-28|20:31:45.313] Opening Database label=chaindata path=/eth/erigon/chaindata
Jul 28 20:31:45 slc04 ERIGON[383187]: [INFO] [07-28|20:31:45.316] Initialised chain configuration config="{ChainID: 1, Homestead: 1150000, DAO: 1920000, DAO Support: true, Tangerine Whistle: 2463000, Spurious Dragon: 2675000, Byzantium: 4370000, Constantinople: 7280000, Petersburg: 7280000, Istanbul: 9069000, Muir Glacier: 9200000, Berlin: 12244000, London: 12965000, Arrow Glacier: 13773000, Gray Glacier: 15050000, Terminal Total Difficulty: <nil>, Merge Netsplit: <nil>, Engine: ethash}" genesis=0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
Jul 28 20:31:45 slc04 ERIGON[383187]: [INFO] [07-28|20:31:45.317] Initialising Ethereum protocol network=1
Jul 28 20:31:45 slc04 ERIGON[383187]: [INFO] [07-28|20:31:45.317] Using snapshots on=true
Jul 28 20:31:45 slc04 ERIGON[383187]: [INFO] [07-28|20:31:45.317] Effective prune_flags= snapshot_flags="--snapshots=true"
Jul 28 20:31:50 slc04 ERIGON[383187]: [INFO] [07-28|20:31:50.539] Disk storage enabled for ethash DAGs dir=/eth/erigon/ethash-dags count=2
Jul 28 20:31:50 slc04 ERIGON[383187]: [INFO] [07-28|20:31:50.976] Starting private RPC server on=127.0.0.1:9090
Jul 28 20:31:50 slc04 ERIGON[383187]: [INFO] [07-28|20:31:50.976] new subscription to logs established
Jul 28 20:31:50 slc04 ERIGON[383187]: [INFO] [07-28|20:31:50.980] Started P2P networking version=66 self=enode://f94c1e911fe1805cb637925710b5bcfc6b93653a087997a93f1ec3d90b629419fe0cc8f934ddf2d38d17010e4cf8d9bfda516c5db5ccc635149ef41630d7d36e@127.0.0.1:30303 name=erigon/v2022.07.3-alpha-2f8f539d/linux-amd64/go1.18
Jul 28 20:31:50 slc04 ERIGON[383187]: [INFO] [07-28|20:31:50.991] [Snapshots] Fetching torrent files metadata
Jul 28 20:31:51 slc04 ERIGON[383187]: [INFO] [07-28|20:31:51.473] New txs subscriber joined
Jul 28 20:31:51 slc04 ERIGON[383187]: [INFO] [07-28|20:31:51.474] new subscription to newHeaders established
Jul 28 20:31:53 slc04 ERIGON[383187]: [INFO] [07-28|20:31:53.931] [Snapshots] Stat blocks=15143999 segments=15143999 indices=15143999 alloc=2.6GB sys=3.4GB
Jul 28 20:31:53 slc04 ERIGON[383187]: [INFO] [07-28|20:31:53.931] [1/16 Headers] Waiting for headers... from=15234988
Now seems to be 1000 blocks hole from 15144000 to 15145000...
Please advise what else you'd like to see in erigon log...
Looking at log pasted above:
Jul 28 20:31:53 slc04 ERIGON[383187]: [INFO] [07-28|20:31:53.931] [Snapshots] Stat blocks=15143999 segments=15143999 indices=15143999 alloc=2.6GB sys=3.4GB
It seems highly suspicious that block 15143999 coincides exactly with where i found issue via my script. Though I'm not knowledgeable enough to speculate exactly what is going on.
grepping a bit:
Jul 28 01:55:21 slc04 ERIGON[606667]: [INFO] [07-28|01:55:21.402] [snapshots] Merge done from=15130000
Jul 28 05:25:56 slc04 ERIGON[606667]: [INFO] [07-28|05:25:56.642] [snapshots] Retire Blocks range=15140k-15141k
Jul 28 05:26:45 slc04 ERIGON[606667]: [INFO] [07-28|05:26:45.621] [Snapshots Txs] Replacement preprocessing processed=42.73% alloc=4.8GB sys=8.7GB
Jul 28 05:27:05 slc04 ERIGON[606667]: [INFO] [07-28|05:27:05.621] [Snapshots Txs] Replacement preprocessing processed=84.02% alloc=5.7GB sys=8.7GB
Jul 28 05:27:14 slc04 ERIGON[606667]: [INFO] [07-28|05:27:14.241] [snapshots] Compression ratio=2.14 file=v1-015140-015141-transactions
Jul 28 05:27:17 slc04 ERIGON[606667]: [INFO] [07-28|05:27:17.552] [snapshots] Build indices from=15139999
Jul 28 09:12:12 slc04 ERIGON[606667]: [INFO] [07-28|09:12:12.953] [snapshots] Retire Blocks range=15141k-15142k
Jul 28 09:12:37 slc04 ERIGON[606667]: [INFO] [07-28|09:12:37.237] [snapshots] Dumping txs block num=15141684 alloc=5.4GB sys=8.7GB
Jul 28 09:13:31 slc04 ERIGON[606667]: [INFO] [07-28|09:13:31.376] [Snapshots Txs] Replacement preprocessing processed=15.36% alloc=5.5GB sys=8.7GB
Jul 28 09:13:51 slc04 ERIGON[606667]: [INFO] [07-28|09:13:51.345] [Snapshots Txs] Replacement preprocessing processed=34.15% alloc=6.1GB sys=8.7GB
Jul 28 09:14:11 slc04 ERIGON[606667]: [INFO] [07-28|09:14:11.345] [Snapshots Txs] Replacement preprocessing processed=55.50% alloc=5.4GB sys=8.7GB
Jul 28 09:14:31 slc04 ERIGON[606667]: [INFO] [07-28|09:14:31.347] [Snapshots Txs] Replacement preprocessing processed=73.51% alloc=3.6GB sys=8.7GB
Jul 28 09:14:51 slc04 ERIGON[606667]: [INFO] [07-28|09:14:51.344] [Snapshots Txs] Replacement preprocessing processed=92.67% alloc=3.7GB sys=8.7GB
Jul 28 09:14:55 slc04 ERIGON[606667]: [INFO] [07-28|09:14:55.380] [snapshots] Compression ratio=2.23 file=v1-015141-015142-transactions
Jul 28 09:14:57 slc04 ERIGON[606667]: [INFO] [07-28|09:14:57.752] [snapshots] Build indices from=15140999
I think my initial testing probably took place about here? Where I saw 15140000 to 15141999 goofed up...
Jul 28 12:56:11 slc04 ERIGON[606667]: [INFO] [07-28|12:56:11.703] [snapshots] Retire Blocks range=15142k-15143k
Jul 28 12:56:32 slc04 ERIGON[606667]: [INFO] [07-28|12:56:32.839] [snapshots] Dumping txs block num=15142900 alloc=5.4GB sys=8.7GB
Jul 28 12:57:01 slc04 ERIGON[606667]: [INFO] [07-28|12:57:01.818] [Snapshots Txs] Replacement preprocessing processed=66.25% alloc=3.3GB sys=8.7GB
Jul 28 12:57:11 slc04 ERIGON[606667]: [INFO] [07-28|12:57:11.789] [snapshots] Compression ratio=2.06 file=v1-015142-015143-transactions
Jul 28 12:57:13 slc04 ERIGON[606667]: [INFO] [07-28|12:57:13.734] [snapshots] Build indices from=15141999
Jul 28 16:47:33 slc04 ERIGON[606667]: [INFO] [07-28|16:47:33.903] [snapshots] Retire Blocks range=15143k-15144k
Jul 28 16:48:14 slc04 ERIGON[606667]: [INFO] [07-28|16:48:14.327] [Snapshots Txs] Replacement preprocessing processed=62.77% alloc=5.1GB sys=8.7GB
Jul 28 16:48:26 slc04 ERIGON[606667]: [INFO] [07-28|16:48:26.400] [snapshots] Compression ratio=2.12 file=v1-015143-015144-transactions
Jul 28 16:48:29 slc04 ERIGON[606667]: [INFO] [07-28|16:48:29.202] [snapshots] Build indices from=15142999
later re-check took place here, now goofed up from 15140000 to 15143999 (seems to line up with "Retire Blocks" ranges)
Jul 28 20:31:45 slc04 ERIGON[383187]: [INFO] [07-28|20:31:45.317] Using snapshots on=true
Jul 28 20:31:45 slc04 ERIGON[383187]: [INFO] [07-28|20:31:45.317] Effective prune_flags= snapshot_flags="--snapshots=true"
Jul 28 20:31:50 slc04 ERIGON[383187]: [INFO] [07-28|20:31:50.991] [Snapshots] Fetching torrent files metadata
Jul 28 20:31:53 slc04 ERIGON[383187]: [INFO] [07-28|20:31:53.931] [Snapshots] Stat blocks=15143999 segments=15143999 indices=15143999 alloc=2.6GB sys=3.4GB
Jul 28 20:33:51 slc04 ERIGON[383187]: [INFO] [07-28|20:33:51.833] [snapshots] Retire Blocks range=15144k-15145k
After restart, looked here... now only see 15144k to 15145k messed up...
So seems something bad happens upon "retire blocks" that gets corrected by a reboot, or perhaps a "merge"?
`` Jul 28 20:34:40 slc04 ERIGON[383187]: [INFO] [07-28|20:34:40.402] [Snapshots Txs] Replacement preprocessing processed=43.93% alloc=3.9GB sys=4.5GB Jul 28 20:35:00 slc04 ERIGON[383187]: [INFO] [07-28|20:35:00.400] [Snapshots Txs] Replacement preprocessing processed=93.24% alloc=4.0GB sys=4.5GB Jul 28 20:35:03 slc04 ERIGON[383187]: [INFO] [07-28|20:35:03.228] [snapshots] Compression ratio=2.25 file=v1-015144-015145-transactions Jul 28 20:35:06 slc04 ERIGON[383187]: [INFO] [07-28|20:35:06.585] [snapshots] Build indices from=15143999
Running with --snap.keepblocks did not prevent the issue from occuring
--snap.stop seems to be a successful workaround
appears to be fixed in v2022.08.01. If this makes sense then perhaps this issue can be closed.
This issue is stale because it has been open for 40 days with no activity. Remove stale label or comment, or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.