gracefully handle out of disk space failures
Describe the bug grin fails to recover gracefully from a crash resulting from running out of disk space.
To Reproduce Steps to reproduce the behavior:
- Run grin
- Run out of disk space; grin panics
- Increase disk space and restart grin
Relevant Information
20200817 16:51:27.347 INFO grin - Using configuration file at /home/jameson/.grin/main/grin-server.toml
20200817 16:51:27.347 INFO grin - This is Grin version 4.0.2 (git v4.0.2), built for x86_64-unknown-linux-gnu by rustc 1.45.2 (d3fb005a3 2020-07-31).
20200817 16:51:27.347 DEBUG grin - Built with profile "release", features "".
20200817 16:51:27.347 INFO grin - Chain: Mainnet
20200817 16:51:27.347 INFO grin - Feature: NRD kernel enabled: false
20200817 16:51:27.347 WARN grin::cmd::server - Starting GRIN in UI mode...
20200817 16:51:27.354 INFO grin_servers::grin::server - Starting server, genesis block: 40adad0aec27
20200817 16:51:27.358 DEBUG grin_store::lmdb - DB Mapsize for /home/jameson/.grin/main/chain_data/lmdb is 549755813888
20200817 16:51:27.431 DEBUG grin_store::leaf_set - bitmap 162820 pos (315706 bytes)
20200817 16:51:29.779 DEBUG grin_store::prune_list - bitmap 478437 pos (718704 bytes), pruned_cache 6843301 pos (772299 bytes), shift_cache 478437, leaf_shift_cache 478437
20200817 16:51:29.920 DEBUG grin_store::leaf_set - bitmap 162820 pos (315706 bytes)
20200817 16:51:32.365 DEBUG grin_store::prune_list - bitmap 478437 pos (718704 bytes), pruned_cache 6843301 pos (772299 bytes), shift_cache 478437, leaf_shift_cache 478437
20200817 16:51:32.407 DEBUG grin_chain::txhashset::bitmap_accumulator - applied 3777 chunks from idx 0 to idx 3776 (41ms)
20200817 16:51:34.074 DEBUG grin_chain::txhashset::txhashset - attempting to open kernel PMMR using ProtocolVersion(2) - FAIL (verify failed)
20200817 16:51:34.117 DEBUG grin_chain::txhashset::txhashset - attempting to open kernel PMMR using ProtocolVersion(1) - SUCCESS
20200817 16:51:34.325 ERROR grin_util::logger -
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Chain(Error { inner:
Other Error: failed to find head hash })': src/bin/cmd/server.rs:48 0: grin_util::logger::send_panic_to_log::{{closure}}
1: std::panicking::rust_panic_with_hook
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panicking.rs:490
2: rust_begin_unwind
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panicking.rs:388
3: core::panicking::panic_fmt
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libcore/panicking.rs:101
4: core::option::expect_none_failed
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libcore/option.rs:1272
5: grin::cmd::server::start_server_tui
6: grin::cmd::server::server_command
7: grin::real_main
8: grin::main
9: std::rt::lang_start::{{closure}}
10: std::rt::lang_start_internal::{{closure}}
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/rt.rs:52
std::panicking::try::do_call
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panicking.rs:297
std::panicking::try
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panicking.rs:274
std::panic::catch_unwind
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panic.rs:394
std::rt::lang_start_internal
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/rt.rs:51
11: main
12: __libc_start_main
13: _start
Desktop (please complete the following information):
- OS: Ubuntu 18.04
I ran grin --clean and it appears to have wiped all 6 GB of chain data; the node is now resyncing from genesis.
Hey @jlopp. Thanks for reporting this.
We have a couple of "known" edge cases where file corruption can occur on non-clean shutdown. Running out of disk space is likely to exercise at least one of those.
I'd like to take another look at how we handle writing files to disk (these are the global MMMR files) if we can get some time to do so. Hopefully we can get this into a more robust state prior to the final scheduled hardfork early next year.
Related - https://github.com/mimblewimble/grin/pull/3266 Also related - https://github.com/mimblewimble/grin/issues/3352