Astar icon indicating copy to clipboard operation
Astar copied to clipboard

Warp sync triggers OOM on Astar

Open bLd75 opened this issue 1 year ago • 7 comments

Description

Warp sync in not operational on Astar in latest versions, after downloading state (5.3+ Gb), import state triggers OOM on the server

Steps to Reproduce

Start Astar node sync with --sync warp option

Environment

Quite similar to this issue but on para side. ~~Issue will be solved after uplifting to Polkadot v1.0.0~~

bLd75 avatar Dec 19 '23 10:12 bLd75

It is resolved right?

ashutoshvarma avatar May 17 '24 10:05 ashutoshvarma

@ashutoshvarma we have OOM case currently ongoing on Astar latest version. @paradox-tt will provide details here.

bLd75 avatar Jun 12 '24 07:06 bLd75

We're still uplifting & catching up to the latest version.

But please do provide command, environment & logs if you have them.

Dinonard avatar Jun 19 '24 14:06 Dinonard

Hey team,

Here's my flags with public address hidden

ExecStart=/usr/local/bin/astar-collator \
  --validator \
  --rpc-cors all \
  --name Dox-Astar-01 \
  --execution wasm \
  --state-cache-size 1 \
  --chain astar \
  --public-addr=/ip4/x.x.x.x/tcp/30330 \
  --listen-addr=/ip4/172.19.12.15/tcp/30330 \
  --bootnodes /ip4/20.93.150.146/tcp/30330/p2p/12D3KooWKZwcaofXPmXWHSSfnh34VFJ8zSRJScnNu9UA75x8kNXi \
  --allow-private-ipv4 \
  --discover-local \
  --rpc-port=9110 \
  --prometheus-external \
  --prometheus-port=9702 \
  --rpc-methods=Unsafe \
#  --sync=warp \
  --blocks-pruning=1000 \
  --state-pruning=1000 \
  --telemetry-url 'wss://telemetry-backend.w3f.community/submit/ 1' \
  --telemetry-url 'wss://telemetry.polkadot.io/submit/ 1' \
#  --relay-chain-rpc-urls "wss://rpc.ibp.network/polkadot" \

There's no error in the logs, except that warping continues until the server's out of memory or the instance reboots

Jun 12 11:23:56 doxastar astar-collator[52243]: 2024-06-12 11:23:56 [Parachain] ⏩ Warping, Downloading state, 406.43 Mib (22 peers), best: #0 (0x9eb7…29c6), finalized #0 (0x9eb7…29c6), ⬇ 0.7kiB/s ⬆ 0.4kiB/s
Jun 12 11:24:00 doxastar astar-collator[52243]: 2024-06-12 11:24:00 [Relaychain] ✨ Imported #21184086 (0x5200…14a5)
Jun 12 11:24:01 doxastar astar-collator[52243]: 2024-06-12 11:24:01 [Relaychain] 💤 Idle (15 peers), best: #21184086 (0x5200…14a5), finalized #21184083 (0xb341…3109), ⬇ 145.2kiB/s ⬆ 192.6kiB/s
Jun 12 11:24:02 doxastar astar-collator[52243]: 2024-06-12 11:24:01 [Parachain] ⏩ Warping, Downloading state, 409.57 Mib (22 peers), best: #0 (0x9eb7…29c6), finalized #0 (0x9eb7…29c6), ⬇ 272.5kiB/s ⬆ 0.9kiB/s
Jun 12 11:24:06 doxastar astar-collator[52243]: 2024-06-12 11:24:06 [Relaychain] ✨ Imported #21184087 (0x6345…6470)
-- Boot 5e8c89c8388a471daea298612802f1e0 --
Jun 12 11:27:01 doxastar systemd[1]: Started Astar Node.
Jun 12 11:27:01 doxastar astar-collator[738]: `--state-cache-size` was deprecated. Please switch to `--trie-cache-size`.
Jun 12 11:27:01 doxastar astar-collator[738]: CLI parameter `--execution` has no effect anymore and will be removed in the future!
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 Astar Collator
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 ✌️  version 5.39.1-111d18fbfba
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 ❤️  by Stake Technologies <[email protected]>, 2019-2024
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 📋 Chain specification: Astar
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 🏷  Node name: Dox-Astar-01
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 👤 Role: AUTHORITY
Jun 12 11:27:01 doxastar astar-collator[738]: 2024-06-12 11:27:01 💾 Database: RocksDb at /home/astar_1/.local/share/astar-collator/ch

paradox-tt avatar Jun 21 '24 07:06 paradox-tt

Update on pre v5.42.0 client test: the issue is still the same. In my tests with 32GB RAM, the node always gets OOM at the same time: importing state at 5762.42 Mib. Once it arrives at this state size, suddenly memory gets filled and bursts to 100% in less than 2 minutes. image

bLd75 avatar Jul 16 '24 09:07 bLd75

I can't see any significant correlation do disk usage, meaning the problem is targetted on RAM usage by warp sync. image

bLd75 avatar Jul 16 '24 09:07 bLd75

More insights on memory on short time frame image image image image image image image image image

bLd75 avatar Jul 16 '24 09:07 bLd75