polkadot icon indicating copy to clipboard operation
polkadot copied to clipboard

DB corrupted: Corruption: force_consistency_checks

Open polkalegos opened this issue 3 years ago • 2 comments

DB gets corrupted and my node gets stuck restarting all the time. Only working solution was to resync incurring into downtime.

  • Role: validator
  • Running on Docker images v0.9.26
  • Flags: "--validator", "--name=legos-x", "--chain=kusama", "--prometheus-external", "--prometheus-port=9615", "--pruning=1000", "--telemetry-url", "wss://telemetry-backend.w3f.community/submit 1"
  • Logs:
2022-07-24 05:08:13 DB corrupted: Corruption: force_consistency_checks: VersionBuilder: L6 files are not sorted properly: files #25645794, #25645968. Repair will be triggered on next restart
2022-07-24 05:08:13 GRANDPA voter error: could not complete a round on disk: Database
2022-07-24 05:08:13 Essential task `grandpa-voter` failed. Shutting down service.
Error:
   0: Other: Essential task failed.

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   1: __libc_start_main<unknown>
      at <unknown source file>:<unknown line>

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
Run with RUST_BACKTRACE=full to include source snippets.
...
...
...
2022-07-24 05:08:16 ⛓  Native runtime: kusama-9260 (parity-kusama-0.tx12.au2)
2022-07-24 05:08:18 DB has been previously marked as corrupted, attempting repair
Error:
   0: Backend error: Corruption: force_consistency_checks: VersionBuilder: L0 file #45913943 with seqno 3226520822 3226530550 vs. file #45914397 with seqno 3226529071 3226530539

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   1: __libc_start_main<unknown>
      at <unknown source file>:<unknown line>

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
Run with RUST_BACKTRACE=full to include source snippets.

polkalegos avatar Jul 25 '22 12:07 polkalegos

We also experience this issue, our underlying cause tends to be out of memory errors. It would be ideal if those errors were captured and the database closed cleanly when they take place. we run archive nodes, that re-sync is a huge problem, soon reaching 500 GB DB size.

rvalle avatar Jul 25 '22 14:07 rvalle

CC @arkpar

sandreim avatar Jul 26 '22 06:07 sandreim

Same error on v0.9.29

polkalegos avatar Oct 02 '22 12:10 polkalegos

This looks like a rocksdb issue. I suggest switching to --database=paritydb

arkpar avatar Oct 03 '22 09:10 arkpar

Isn't that database the alternative not recommended so far? @arkpar

polkalegos avatar Oct 03 '22 13:10 polkalegos

ParityDB is not experimental anymore, so you can use it.

bkchr avatar Oct 04 '22 08:10 bkchr