bsc icon indicating copy to clipboard operation
bsc copied to clipboard

Database compacting stops syncing chain

Open koen84 opened this issue 3 years ago • 11 comments

System information

Geth version: 1.0.7 OS & Version: Ubuntu 18.04

Expected behaviour

BSC keeps syncing the chain. Don't do stuff automatically that degrades performance, let alone effectively halts usage.

Actual behaviour

lvl=warn msg="Database compacting, degraded performance" These happened automatic and caused a drop dead in chain syncing.

Note : disk was 95% full.

koen84 avatar May 18 '21 00:05 koen84

Took 8h, then started to catch up. Catch up was combined with snapshot resume + aborting (which slows the chain sync a lot). Meanwhile it's back at compacting and not syncing at all.

This is incredibly frustrating, as it makes an expensive BSC archive unusable.

koen84 avatar May 18 '21 21:05 koen84

@guagualvcha any ideas ? The node keeps doing this "stopping chain sync for doing database compacting", preventing it from catch the chainhead.

koen84 avatar May 19 '21 17:05 koen84

And it's back with this idiocy. I'm ranging 90-95% disk used, which means 589G at 92% (you can run a fast node in that amount of storage still). While another diskupgrade is planned, it still has pleny leaway to operate normally.

koen84 avatar Jun 07 '21 13:06 koen84

Would love some support on this. During times of network congestion it’s taking longer to sync to chain head, which impacts graph, which then trickles down to the user interface.

DeFiFoFum avatar Jul 29 '21 18:07 DeFiFoFum

@j75689 please take a look at this issue w.r.t. your call for feedback of syncing problems.

Even if i sync at 40-55 blocks per minute (compared to chainspeed of 20bpm) i can catch up all i want, but still get rekt hard when bsc node decides for itself it'll do database compacting and effectively stops syncing the chain. I run 3 bsc archive nodes and latest days the lot of them are all going often / long into this database compacting (different timings - sometimes they're all 3 at it simultaneously), leaving us without up-to-chainhead archive node.

The biggest problem of it is, there seems to be no control over if / when this happens. We're already running --nocompaction on all nodes, but that is ignored (or meant for something else).

I'm seeing this behaviour on versions 1.0.7-ht3, 1.1.0-beta & 1.1.1-beta

koen84 avatar Aug 01 '21 09:08 koen84

Agreed, this is an important issue to fix asap. It's difficult running a reliable production setup, even with the best dedicated hardware, if the software takes itself offline for a maintenance task, without the sysop instructing it to.

Imagine apache stopping to serve webrequests at whim.

MindHeartSoul avatar Aug 01 '21 09:08 MindHeartSoul

Also, current disk usage is :

  • 85% (2.2 TiB free)
  • 71% (3.8 TiB free)
  • 88% (1.3 TiB free)

So seems unlikely a factor, especially seeing all 3 go for database compaction.

koen84 avatar Aug 01 '21 10:08 koen84

I have the same issue trying to sync full archive node. The node syncs at 90-110 mgasps rate for a few hours, but then the compaction is running for 4-6 hours, and again the node syncs for 1-2 hours and database start compacting. It's hardly related to disk's io, because I am using samsung PM1725b ssd drive with AMD EPYC 7742 64-Core processor.

INFO [12-08|12:44:35.876] Imported new chain segment               blocks=28  txs=5734  mgas=824.262  elapsed=8.009s       mgasps=102.915 number=8,151,638 hash=2b4df1..234bec age=6mo1d20h  dirty=2.92MiB
WARN [12-08|12:45:11.752] Database compacting, degraded performance database=/mnt/bsc/data/geth/chaindata
....
WARN [12-08|16:03:39.974] Database compacting, degraded performance database=/mnt/bsc/data/geth/chaindata
INFO [12-08|16:03:51.620] Writing clean trie cache to disk         path=/mnt/bsc/data/geth/triecache threads=1
INFO [12-08|16:03:51.621] Regenerated local transaction journal    transactions=0 accounts=0
INFO [12-08|16:04:07.771] Persisted the clean trie cache           path=/mnt/bsc/data/geth/triecache elapsed=16.150s
WARN [12-08|16:04:42.969] Database compacting, degraded performance database=/mnt/bsc/data/geth/chaindata

This compaction is taking a long time, the node was able to sync up to block 6 000 000 in a 2 days, but from that block up to 8 million+ block it took more than 10 days. The node was started with the following command:

./geth --datadir /mnt/bsc/data --ethash.cachesinmem 6 --ethash.cachesondisk 0 --ethash.dagsinmem 4 \
--ethash.dagsondisk 0 --ws --ws.port 13334 --ws.api eth,net,web3 --syncmode full --gcmode archive --cache 32768 \
--http.api eth,web3,admin,debug,txpool,net --http --txpool.accountslots 100000 --txpool.globalslots 100000 --txpool.accountqueue 100000 \
--txpool.globalqueue 100000 --txpool.lifetime 24h0m0s --maxpeers 64 --txlookuplimit=0 \
--cache.preimages --rpc.allow-unprotected-txs --config /mnt/bsc/data/config.toml --nocompaction --diffsync

alexqrid avatar Dec 08 '21 16:12 alexqrid

I'm actually at a point where my archive nodes compacting litterally endlessly (as in weeks).

koen84 avatar Dec 21 '21 23:12 koen84

The same issue..., we are using i3en.12xlarge from scratch without success. Any tips and how to for full archive node from Binance?

derwin4o avatar Apr 26 '22 14:04 derwin4o

The same issue.... Is there any solution to this problem?

BabySid avatar Jun 06 '22 03:06 BabySid