bsc
bsc copied to clipboard
Database compacting stops syncing chain
System information
Geth version: 1.0.7
OS & Version: Ubuntu 18.04
Expected behaviour
BSC keeps syncing the chain. Don't do stuff automatically that degrades performance, let alone effectively halts usage.
Actual behaviour
lvl=warn msg="Database compacting, degraded performance"
These happened automatic and caused a drop dead in chain syncing.
Note : disk was 95% full.
Took 8h, then started to catch up. Catch up was combined with snapshot resume + aborting (which slows the chain sync a lot). Meanwhile it's back at compacting and not syncing at all.
This is incredibly frustrating, as it makes an expensive BSC archive unusable.
@guagualvcha any ideas ? The node keeps doing this "stopping chain sync for doing database compacting", preventing it from catch the chainhead.
And it's back with this idiocy. I'm ranging 90-95% disk used, which means 589G at 92% (you can run a fast node in that amount of storage still). While another diskupgrade is planned, it still has pleny leaway to operate normally.
Would love some support on this. During times of network congestion it’s taking longer to sync to chain head, which impacts graph, which then trickles down to the user interface.
@j75689 please take a look at this issue w.r.t. your call for feedback of syncing problems.
Even if i sync at 40-55 blocks per minute (compared to chainspeed of 20bpm) i can catch up all i want, but still get rekt hard when bsc node decides for itself it'll do database compacting and effectively stops syncing the chain. I run 3 bsc archive nodes and latest days the lot of them are all going often / long into this database compacting (different timings - sometimes they're all 3 at it simultaneously), leaving us without up-to-chainhead archive node.
The biggest problem of it is, there seems to be no control over if / when this happens. We're already running --nocompaction
on all nodes, but that is ignored (or meant for something else).
I'm seeing this behaviour on versions 1.0.7-ht3, 1.1.0-beta & 1.1.1-beta
Agreed, this is an important issue to fix asap. It's difficult running a reliable production setup, even with the best dedicated hardware, if the software takes itself offline for a maintenance task, without the sysop instructing it to.
Imagine apache stopping to serve webrequests at whim.
Also, current disk usage is :
- 85% (2.2 TiB free)
- 71% (3.8 TiB free)
- 88% (1.3 TiB free)
So seems unlikely a factor, especially seeing all 3 go for database compaction.
I have the same issue trying to sync full archive node. The node syncs at 90-110 mgasps rate for a few hours, but then the compaction is running for 4-6 hours, and again the node syncs for 1-2 hours and database start compacting. It's hardly related to disk's io, because I am using samsung PM1725b ssd drive with AMD EPYC 7742 64-Core processor.
INFO [12-08|12:44:35.876] Imported new chain segment blocks=28 txs=5734 mgas=824.262 elapsed=8.009s mgasps=102.915 number=8,151,638 hash=2b4df1..234bec age=6mo1d20h dirty=2.92MiB
WARN [12-08|12:45:11.752] Database compacting, degraded performance database=/mnt/bsc/data/geth/chaindata
....
WARN [12-08|16:03:39.974] Database compacting, degraded performance database=/mnt/bsc/data/geth/chaindata
INFO [12-08|16:03:51.620] Writing clean trie cache to disk path=/mnt/bsc/data/geth/triecache threads=1
INFO [12-08|16:03:51.621] Regenerated local transaction journal transactions=0 accounts=0
INFO [12-08|16:04:07.771] Persisted the clean trie cache path=/mnt/bsc/data/geth/triecache elapsed=16.150s
WARN [12-08|16:04:42.969] Database compacting, degraded performance database=/mnt/bsc/data/geth/chaindata
This compaction is taking a long time, the node was able to sync up to block 6 000 000
in a 2 days, but from that block up to 8 million+ block it took more than 10 days.
The node was started with the following command:
./geth --datadir /mnt/bsc/data --ethash.cachesinmem 6 --ethash.cachesondisk 0 --ethash.dagsinmem 4 \
--ethash.dagsondisk 0 --ws --ws.port 13334 --ws.api eth,net,web3 --syncmode full --gcmode archive --cache 32768 \
--http.api eth,web3,admin,debug,txpool,net --http --txpool.accountslots 100000 --txpool.globalslots 100000 --txpool.accountqueue 100000 \
--txpool.globalqueue 100000 --txpool.lifetime 24h0m0s --maxpeers 64 --txlookuplimit=0 \
--cache.preimages --rpc.allow-unprotected-txs --config /mnt/bsc/data/config.toml --nocompaction --diffsync
I'm actually at a point where my archive nodes compacting litterally endlessly (as in weeks).
The same issue..., we are using i3en.12xlarge from scratch without success. Any tips and how to for full archive node from Binance?
The same issue.... Is there any solution to this problem?