erigon icon indicating copy to clipboard operation
erigon copied to clipboard

intermittent sync issues and size of chaindata folder keeps increasing after Rio HF.

Open ashu1777 opened this issue 2 months ago • 6 comments

System information

Erigon version: 3.1.3

OS & Version: Linux

Erigon Command (with flags/config):

  • erigon - --chain=bor-mainnet - --datadir=/home/erigon/persistence/data - --http.addr=0.0.0.0 - --rpc.accessList=/home/erigon/acl-config/acl.json - --rpc.batch.limit=10000 - --rpc.txfeecap=100 - --http.api=eth,erigon,web3,net,debug,txpool,trace - --http.vhosts=* - --http.corsdomain=null - --http - --ws - --db.pagesize=16384 - --ethash.dagdir=/home/erigon/persistence/dag - --maxpeers=100 - --identity=nd-147-610-977 - --private.api.addr=0.0.0.0:9090 - --private.api.ratelimit=63744 - --rpc.returndata.limit=10000000 - --metrics - --metrics.addr=0.0.0.0 - --healthcheck - --port=30303 - --db.size.limit=12TB - --torrent.download.rate=512mb - --torrent.download.slots=5 - --staticpeers=enode://e4fb013061eba9a2c6fb0a41bbd4149f4808f0fb7e88ec55d7163f19a6f02d64d0ce5ecc81528b769ba552a7068057432d44ab5e9e42842aff5b4709aa2c3f3b@34.89.75.187:30303,enode://a49da6300403cf9b31e30502eb22c142ba4f77c9dda44990bccce9f2121c3152487ee95ee55c6b92d4cdce77845e40f59fd927da70ea91cf935b23e262236d75@34.142.43.249:30303,enode://0e50fdcc2106b0c4e4d9ffbd7798ceda9432e680723dc7b7b4627e384078850c1c4a3e67f17ef2c484201ae6ee7c491cbf5e189b8ffee3948252e9bef59fc54e@35.234.148.172:30303,enode://a0bc4dd2b59370d5a375a7ef9ac06cf531571005ae8b2ead2e9aaeb8205168919b169451fb0ef7061e0d80592e6ed0720f559bd1be1c4efb6e6c4381f1bdb986@35.246.99.203:30303,enode://f2b0d50e0b843d38ddcab59614f93065e2c82130100032f86ae193eb874505de12fcaf12502dfd88e339b817c0b374fa4b4f7c4d5a4d1aa04f29c503d95e0228@35.197.233.240:30303,enode://72c3176693f7100dfedc8a37909120fea16971260a5d95ceff49affbc0e23968c35655fee75734736f0b038147645e8ceeee59af68859b3f5bf91fe249be6259@35.246.95.65:30303,enode://f0e44769385aea31de930d3f4796e3e348962221063bb9f681106d832d13f70e5543d652d30e819812104f1b1ffdd7585977b46bf802ed5a52cf731de8c48dbd@34.105.180.11:30303,enode://fc7624241515f9d5e599a396362c29de92b13a048ad361c90dd72286aa4cca835ba65e140a46ace70cc4dcb18472a476963750b3b69d958c5f546d48675880a8@34.147.169.102:30303,enode://198896e373735ba38a0313d073137a413787ece791fbc0d0be0f9f6b9d9dd00ee0841f46519904d666d7f1cdfce5532b093e3a1574b34eb64224f57b9b7fce7b@34.89.55.74:30303,enode://07bc4cf87ff8f4e7dc51280991809940f26e846c944609ae4726309be73742a830040cd783989f6941e1b41c02405834bc6365059403a59ca9255ac695156235@34.89.75.187:30303,enode://f81234949f791624d1196eb3a780490f5a8199b476c3522335e6d76ca96aa9155ad21c308864b1e22ab9a53136b486520b33515310f8f18485ab471826ae9ded@34.142.43.249:30303,enode://f5cfe35f47ed928d5403aa28ee616fd64ed7daa527b5ae6a7bc412ca25eaad9b6bf2f776144fd9f8e7e9c80b5360a9c03b67f1d47ea88767def7d391cc7e0cd1@34.105.180.11:30303,enode://a36848f536ff6c431e9e3ccbb2f859a5c71f6e5e2d282d8dc6e0199618256444c5032f4cbf7e8579da9fa4d30251b7a55a2d6d3711516112e8dced057c8596c6@34.89.55.74:30303 - --http.timeouts.read=300s - --db.read.concurrency=16384 - --prune.mode=archive - --rpc.batch.concurrency=24 - --bootnodes=enode://e4fb013061eba9a2c6fb0a41bbd4149f4808f0fb7e88ec55d7163f19a6f02d64d0ce5ecc81528b769ba552a7068057432d44ab5e9e42842aff5b4709aa2c3f3b@34.89.75.187:30303,enode://a49da6300403cf9b31e30502eb22c142ba4f77c9dda44990bccce9f2121c3152487ee95ee55c6b92d4cdce77845e40f59fd927da70ea91cf935b23e262236d75@34.142.43.249:30303,enode://0e50fdcc2106b0c4e4d9ffbd7798ceda9432e680723dc7b7b4627e384078850c1c4a3e67f17ef2c484201ae6ee7c491cbf5e189b8ffee3948252e9bef59fc54e@35.234.148.172:30303,enode://a0bc4dd2b59370d5a375a7ef9ac06cf531571005ae8b2ead2e9aaeb8205168919b169451fb0ef7061e0d80592e6ed0720f559bd1be1c4efb6e6c4381f1bdb986@35.246.99.203:30303,enode://f2b0d50e0b843d38ddcab59614f93065e2c82130100032f86ae193eb874505de12fcaf12502dfd88e339b817c0b374fa4b4f7c4d5a4d1aa04f29c503d95e0228@35.197.233.240:30303,enode://72c3176693f7100dfedc8a37909120fea16971260a5d95ceff49affbc0e23968c35655fee75734736f0b038147645e8ceeee59af68859b3f5bf91fe249be6259@35.246.95.65:30303,enode://f0e44769385aea31de930d3f4796e3e348962221063bb9f681106d832d13f70e5543d652d30e819812104f1b1ffdd7585977b46bf802ed5a52cf731de8c48dbd@34.105.180.11:30303,enode://fc7624241515f9d5e599a396362c29de92b13a048ad361c90dd72286aa4cca835ba65e140a46ace70cc4dcb18472a476963750b3b69d958c5f546d48675880a8@34.147.169.102:30303,enode://198896e373735ba38a0313d073137a413787ece791fbc0d0be0f9f6b9d9dd00ee0841f46519904d666d7f1cdfce5532b093e3a1574b34eb64224f57b9b7fce7b@34.89.55.74:30303,enode://07bc4cf87ff8f4e7dc51280991809940f26e846c944609ae4726309be73742a830040cd783989f6941e1b41c02405834bc6365059403a59ca9255ac695156235@34.89.75.187:30303,enode://f81234949f791624d1196eb3a780490f5a8199b476c3522335e6d76ca96aa9155ad21c308864b1e22ab9a53136b486520b33515310f8f18485ab471826ae9ded@34.142.43.249:30303,enode://f5cfe35f47ed928d5403aa28ee616fd64ed7daa527b5ae6a7bc412ca25eaad9b6bf2f776144fd9f8e7e9c80b5360a9c03b67f1d47ea88767def7d391cc7e0cd1@34.105.180.11:30303,enode://a36848f536ff6c431e9e3ccbb2f859a5c71f6e5e2d282d8dc6e0199618256444c5032f4cbf7e8579da9fa4d30251b7a55a2d6d3711516112e8dced057c8596c6@34.89.55.74:30303 - --torrent.download.rate=120mb - --torrent.staticpeers=enode://e4fb013061eba9a2c6fb0a41bbd4149f4808f0fb7e88ec55d7163f19a6f02d64d0ce5ecc81528b769ba552a7068057432d44ab5e9e42842aff5b4709aa2c3f3b@34.89.75.187:30303,enode://a49da6300403cf9b31e30502eb22c142ba4f77c9dda44990bccce9f2121c3152487ee95ee55c6b92d4cdce77845e40f59fd927da70ea91cf935b23e262236d75@34.142.43.249:30303,enode://0e50fdcc2106b0c4e4d9ffbd7798ceda9432e680723dc7b7b4627e384078850c1c4a3e67f17ef2c484201ae6ee7c491cbf5e189b8ffee3948252e9bef59fc54e@35.234.148.172:30303,enode://a0bc4dd2b59370d5a375a7ef9ac06cf531571005ae8b2ead2e9aaeb8205168919b169451fb0ef7061e0d80592e6ed0720f559bd1be1c4efb6e6c4381f1bdb986@35.246.99.203:30303,enode://f2b0d50e0b843d38ddcab59614f93065e2c82130100032f86ae193eb874505de12fcaf12502dfd88e339b817c0b374fa4b4f7c4d5a4d1aa04f29c503d95e0228@35.197.233.240:30303,enode://72c3176693f7100dfedc8a37909120fea16971260a5d95ceff49affbc0e23968c35655fee75734736f0b038147645e8ceeee59af68859b3f5bf91fe249be6259@35.246.95.65:30303,enode://f0e44769385aea31de930d3f4796e3e348962221063bb9f681106d832d13f70e5543d652d30e819812104f1b1ffdd7585977b46bf802ed5a52cf731de8c48dbd@34.105.180.11:30303,enode://fc7624241515f9d5e599a396362c29de92b13a048ad361c90dd72286aa4cca835ba65e140a46ace70cc4dcb18472a476963750b3b69d958c5f546d48675880a8@34.147.169.102:30303,enode://198896e373735ba38a0313d073137a413787ece791fbc0d0be0f9f6b9d9dd00ee0841f46519904d666d7f1cdfce5532b093e3a1574b34eb64224f57b9b7fce7b@34.89.55.74:30303,enode://07bc4cf87ff8f4e7dc51280991809940f26e846c944609ae4726309be73742a830040cd783989f6941e1b41c02405834bc6365059403a59ca9255ac695156235@34.89.75.187:30303,enode://f81234949f791624d1196eb3a780490f5a8199b476c3522335e6d76ca96aa9155ad21c308864b1e22ab9a53136b486520b33515310f8f18485ab471826ae9ded@34.142.43.249:30303,enode://f5cfe35f47ed928d5403aa28ee616fd64ed7daa527b5ae6a7bc412ca25eaad9b6bf2f776144fd9f8e7e9c80b5360a9c03b67f1d47ea88767def7d391cc7e0cd1@34.105.180.11:30303,enode://a36848f536ff6c431e9e3ccbb2f859a5c71f6e5e2d282d8dc6e0199618256444c5032f4cbf7e8579da9fa4d30251b7a55a2d6d3711516112e8dced057c8596c6@34.89.55.74:30303 - --bor.heimdall=http://polygon-pos-mainnet-heimdallv2-0.default.svc.cluster.local:1317

Chain/Network: polygon-pos/mainnet

Expected behaviour

previously the chaindata folder was quite constant(around 60G) and there were no sync issues detected.

Image

Actual behaviour

Now the chaindata folder keeps growing(~300G and beyond) and we are seeing our nodes lagging behind the tip of the chain intermittently.

Image

As a temporary workaround, we clear the chaindata folder, which allows the node to sync smoothly for a while. However, once the chaindata grows again, the nodes begin to lag behind the chain tip.

ashu1777 avatar Oct 24 '25 06:10 ashu1777

I will not open an own issue right now, because I think we experience the same problem:

erigon@erigon-0:~$ du -hs /data/*
0       /data/LOCK
52K     /data/bor
28K     /data/caplin
417G    /data/chaindata
32K     /data/downloader
41M     /data/heimdall
339M    /data/logs
16K     /data/lost+found
24K     /data/migrations
4.0K    /data/nodekey
48M     /data/nodes
4.0M    /data/polygon-bridge
5.1T    /data/snapshots
3.7G    /data/temp
1.1G    /data/txpool
erigon@erigon-0:~$

MrFreezeDZ avatar Oct 24 '25 13:10 MrFreezeDZ

The log lines that seem suspicious to me are these, because I think Erigon is taking too long to finish these "cycles":

{"alloc":"27.7GB", "lvl":"info", "msg":"[4/6 Execution][agg] computing trie", "progress":"565.05k/1.89M", "sys":"77.6GB", "t":"2025-10-24T13:52:18.167333347Z"}

@ashu1777 is this the case also on your Erigon instance or should I open an own issue?

MrFreezeDZ avatar Oct 24 '25 13:10 MrFreezeDZ

Same issue here also, our node hit 1TB limit which necessitated increasing --db.size.limit from its default of 1TB

angusscott avatar Oct 31 '25 11:10 angusscott

Can you give a try with version https://github.com/0xPolygon/erigon/releases/tag/v3.3.0? Thanks

marcello33 avatar Nov 20 '25 15:11 marcello33

Can you give a try with version https://github.com/0xPolygon/erigon/releases/tag/v3.3.1-beta2? Thanks

@marcello33 Would you recommend this for mainnet?

avinashbo avatar Nov 22 '25 02:11 avinashbo

Can you give a try with version https://github.com/0xPolygon/erigon/releases/tag/v3.3.1-beta2? Thanks

@marcello33 Would you recommend this for mainnet?

Sorry @avinashbo, corrected to the right version

marcello33 avatar Nov 22 '25 04:11 marcello33

Did it solve the chaindata size increase issue? Btw, erigon team is aware of the sync problems upstream. We're currently debugging and trying to provide a stable solution for this problem. Thanks for your patience

marcello33 avatar Nov 30 '25 11:11 marcello33

@marcello33 chaindata will continue to grow until new ottersync segments will be generated. Am i missing something? Last segments were generated for 078448-078449 block interval. Because they are so old, I cannot do maintenance tasks like deleting chaindata db because it will take ages to regenerate from 078449.

andreicupaciu avatar Nov 30 '25 16:11 andreicupaciu

Did it solve the chaindata size increase issue? Btw, erigon team is aware of the sync problems upstream. We're currently debugging and trying to provide a stable solution for this problem. Thanks for your patience

Shall we try v3.3.2 as the stable version is now available @marcello33 ? Is it expected to address the chaindata growth?

avinashbo avatar Dec 02 '25 09:12 avinashbo

@avinashbo you can try that yeah, however, I suspect it won't help with the chaindata growth. We talked with the Erigon team and received a suggestion on analysis approach. Basically, the issue is that /data/chaindata is not properly pruning and the bigger it gets, the slower it becomes to process new blocks. A temporary resolution would be to delete it, and you can help us debug it via the following process:

  1. stop the erigon service
  2. sudo du -sh /data/chaindata/* (to get the current size of it)
  3. remove (or rename as backup) /data/chaindata folder (it only contains two files: mdbx.dat and mdbx.lck)
  4. start the erigon service again
  5. Monitor if it gains any speed and look after the /data/chaindata growth You can report this back to us and in the meantime we are working to validate how to improve the pruning so the folder does no longer grow this much. Thanks 🙏

marcello33 avatar Dec 02 '25 09:12 marcello33

@marcello33 Hello! I started syncing erigon on mainnet, but idk why it downloads torrents only untill 76636999 block. After syncing around 300-400k blocks it stucked on pruning stage and chaindata folder became 350+ GB I did like you said, stopped erigon, removed this /data/chaindata/* and after restarting and some time of syncing it showed this error and restarted my container. {"err":"pos sync failed: unexpected bad block at finalized waypoint: fork choice update bad block: status=1, validationErr='updateForkChoice: invalid block, txnIdx=76, gas used by execution: 13361555, in header: 13361522, headerNum=77451310, 9e01584707accd1c4dcb31ae4568d6d73fdb07ea1439156f418f7d212c258ea2'","lvl":"eror","msg":"[polygon.sync] crashed - stopping node","t":"2025-12-05T17:28:39.128861052Z"}

Is there a way to just sync erigon to current block?

Skryabin-P avatar Dec 05 '25 17:12 Skryabin-P