forest
forest copied to clipboard
Terminate Forest gracefully
Describe the bug
When sending SIGTERM (ctrl + c) Forest should gracefully shut down. Seems the signal is handled right now but not really well. We should investigate it.
To Reproduce Steps to reproduce the behaviour:
- Run
forest
, e.g.target/release/forest --target-peer-count 50 --encrypt-keystore false
- Terminate it with ctrl + c
- See the error.
Log output
On Fedora 35 with coredumps enabled (e.g. ulimit -S -c unlimited
)
2022-06-28T10:41:39.747Z INFO rpc > Ready for RPC connections
^C 2022-06-28T10:41:40.105Z WARN forest::cli > Got interrupt, shutting down...
2022-06-28T10:41:40.106Z INFO utils > Permissions set to 0600 on File { fd: 178, path: "/home/rumcajs-work/.forest/keystore.json", read: false, write: true }
2022-06-28T10:41:40.108Z INFO forest::daemon > Forest finish shutdown.
fish: Job 1, 'target/release/forest --target-…' terminated by signal SIGSEGV (Address boundary error)
On Ubuntu 20.04 LTS
2022-06-28T10:37:26.136Z INFO rpc > Ready for RPC connections
^C 2022-06-28T10:37:27.372Z WARN forest::cli > Got interrupt, shutting down...
2022-06-28T10:37:27.372Z INFO utils > Permissions set to 0600 on File { fd: 37, path: "/home/rumcajs/.forest/keystore.json", read: false, write: true }
2022-06-28T10:37:27.390Z INFO chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1936320 -> NETWORK_HEAD = 1936874
Downloading headers 0 / 554 [--------------------------------------------------------------------------------------------------------------------------------------------------] 0.00 % 0.00/s
2022-06-28T10:37:27.494Z INFO forest::daemon > Forest finish shutdown.
Segmentation fault (core dumped)
Expected behaviour
No errors should occur after Forest is terminated and coredump should not be dumped.
Screenshots
Environment (please complete the following information):
- OS: Fedora 35, Ubuntu 20.04
- Rust version
rustc 1.62.0-nightly (cb1219871 2022-05-08)
- Branch/commit
main
1ef1c1a09b6d572ba01fb64cffaec10c230d6c14
Other information and links
This started happening after we switched to the FVM. Maybe something in the WASM jit engine isn't being shutdown correctly.
Went through steps 1 and 2 on Mac. Not seeing the issue
Running forest requires too much dark magic, at the moment. It'll get better once we have our own snapshots and #1669 is implemented.
More context: This happens occasionally while syncing from a snapshot but rarely when the sync has completed.
Did you reproduced it as well on calibnet
?
@elmattic yes, it's pretty consistent on calibnet on ctrl-c
during headers download. Tested on Fedora 36, didn't have to set the setting above.
forest on î‚ main [$?] via 🦀 v1.62.0-nightly took 1m8s
❯ target/release/forest --encrypt-keystore false --target-peer-count 50 --chain calibnet
2022-08-04T09:50:15.964Z WARN forest::cli > No configurations found, using defaults.
2022-08-04T09:50:15.964Z INFO forest::daemon > Starting Forest daemon, version v0.2.2/unstable/2777962b
2022-08-04T09:50:15.964Z INFO forest_libp2p::service > Recovered libp2p keypair from "/home/rumcajs-work/.local/share/forest/libp2p/keypair"
2022-08-04T09:50:15.964Z WARN forest::daemon > Warning: Keystore encryption disabled!
2022-08-04T09:50:15.964Z INFO forest::daemon > Prometheus server started at 127.0.0.1:6116
2022-08-04T09:50:15.964Z INFO forest::daemon > Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXX0.8IqfOBc_l9awxLRg7K6fzePe6jhkpalfrpxKDUqvljw
2022-08-04T09:50:16.330Z INFO genesis > Initialized genesis: BlockHeader: Cid(bafy2bzacecz3trtejxtzix4f4eebs7dekm6snfsmvffiqz2rfx7iwgsgtieq4)
2022-08-04T09:50:16.330Z INFO forest::daemon > Using network :: calibrationnet
2022-08-04T09:50:16.629Z INFO forest::daemon > JSON-RPC endpoint started at 127.0.0.1:1234
2022-08-04T09:50:16.629Z INFO rpc > Ready for RPC connections
2022-08-04T09:50:18.301Z INFO chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1087290 -> NETWORK_HEAD = 1184860
Downloading headers 100 / 97570 [>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 0.10 % 98.82/s 16m ^C 2022-08-04T09:50:19.777Z WARN forest::cli > Got interrupt, shutting down...
2022-08-04T09:50:19.777Z INFO utils > Permissions set to 0600 on File { fd: 6, path: "/home/rumcajs-work/.local/share/forest/keystore.json", read: false, write: true }
2022-08-04T09:50:19.778Z INFO forest::daemon > Forest finish shutdown.
fish: Job 1, 'target/release/forest --encrypt…' terminated by signal SIGSEGV (Address boundary error)
forest on î‚ main [$?] via 🦀 v1.62.0-nightly took 4s
❯ target/release/forest --encrypt-keystore false --target-peer-count 50 --chain calibnet
2022-08-04T09:51:24.011Z WARN forest::cli > No configurations found, using defaults.
2022-08-04T09:51:24.012Z INFO forest::daemon > Starting Forest daemon, version v0.2.2/unstable/2777962b
2022-08-04T09:51:24.012Z INFO forest_libp2p::service > Recovered libp2p keypair from "/home/rumcajs-work/.local/share/forest/libp2p/keypair"
2022-08-04T09:51:24.012Z WARN forest::daemon > Warning: Keystore encryption disabled!
2022-08-04T09:51:24.012Z INFO forest::daemon > Prometheus server started at 127.0.0.1:6116
2022-08-04T09:51:24.012Z INFO forest::daemon > Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXX0.8IqfOBc_l9awxLRg7K6fzePe6jhkpalfrpxKDUqvljw
2022-08-04T09:51:24.050Z INFO genesis > Initialized genesis: BlockHeader: Cid(bafy2bzacecz3trtejxtzix4f4eebs7dekm6snfsmvffiqz2rfx7iwgsgtieq4)
2022-08-04T09:51:24.050Z INFO forest::daemon > Using network :: calibrationnet
2022-08-04T09:51:24.292Z INFO forest::daemon > JSON-RPC endpoint started at 127.0.0.1:1234
2022-08-04T09:51:24.292Z INFO rpc > Ready for RPC connections
2022-08-04T09:51:26.198Z INFO chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1087290 -> NETWORK_HEAD = 1184862
Downloading headers 0 / 97572 [---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 0.00 % 0.00/s ^C 2022-08-04T09:51:26.769Z WARN forest::cli > Got interrupt, shutting down...
2022-08-04T09:51:26.769Z INFO utils > Permissions set to 0600 on File { fd: 6, path: "/home/rumcajs-work/.local/share/forest/keystore.json", read: false, write: true }
2022-08-04T09:51:26.770Z INFO forest::daemon > Forest finish shutdown.
fish: Job 1, 'target/release/forest --encrypt…' terminated by signal SIGSEGV (Address boundary error)
forest on î‚ main [$?] via 🦀 v1.62.0-nightly took 3s
❯ target/release/forest --encrypt-keystore false --target-peer-count 50 --chain calibnet
2022-08-04T09:51:30.870Z WARN forest::cli > No configurations found, using defaults.
2022-08-04T09:51:30.870Z INFO forest::daemon > Starting Forest daemon, version v0.2.2/unstable/2777962b
2022-08-04T09:51:30.870Z INFO forest_libp2p::service > Recovered libp2p keypair from "/home/rumcajs-work/.local/share/forest/libp2p/keypair"
2022-08-04T09:51:30.870Z WARN forest::daemon > Warning: Keystore encryption disabled!
2022-08-04T09:51:30.870Z INFO forest::daemon > Prometheus server started at 127.0.0.1:6116
2022-08-04T09:51:30.870Z INFO forest::daemon > Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXX0.8IqfOBc_l9awxLRg7K6fzePe6jhkpalfrpxKDUqvljw
2022-08-04T09:51:30.896Z INFO genesis > Initialized genesis: BlockHeader: Cid(bafy2bzacecz3trtejxtzix4f4eebs7dekm6snfsmvffiqz2rfx7iwgsgtieq4)
2022-08-04T09:51:30.897Z INFO forest::daemon > Using network :: calibrationnet
2022-08-04T09:51:31.140Z INFO forest::daemon > JSON-RPC endpoint started at 127.0.0.1:1234
2022-08-04T09:51:31.140Z INFO rpc > Ready for RPC connections
2022-08-04T09:51:33.862Z INFO chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1087290 -> NETWORK_HEAD = 1184863
Downloading headers 202 / 97573 [>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 0.21 % 64.33/s 25m ^C 2022-08-04T09:51:37.625Z WARN forest::cli > Got interrupt, shutting down...
2022-08-04T09:51:37.625Z INFO utils > Permissions set to 0600 on File { fd: 147, path: "/home/rumcajs-work/.local/share/forest/keystore.json", read: false, write: true }
2022-08-04T09:51:37.627Z INFO forest::daemon > Forest finish shutdown.
fish: Job 1, 'target/release/forest --encrypt…' terminated by signal SIGSEGV (Address boundary error)
@LesnyRumcajs What exact commit are you using to reproduce it? I feel I'm on the wrong commit (HEAD of main).
@elmattic it's the latest main for me, i.e. 2777962be6ef601bac10b7722489163f83d2c087
I don't get SIGSEGV while downloading headers. I /do/ get SIGSEGV if I hit ctrl-c when "Scanning blockchain" is shown. It's 100% consistent for me.
Still unable to reproduce on Fedora 36.
Oh, manage to reproduce now:
2022-08-05T13:42:13.681Z INFO chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1165085 -> NETWORK_HEAD = 1188204
Downloading headers 23119 / 23119 [===================================================================================================================================================================================================] 100.00 % 14806.30/s ^C 2022-08-05T13:42:15.437Z WARN forest::cli > Got interrupt, shutting down...
2022-08-05T13:42:15.438Z INFO utils > Permissions set to 0600 on File { fd: 114, path: "/root/.local/share/forest/keystore.json", read: false, write: true }
2022-08-05T13:42:15.813Z INFO chain_sync::tipset_syncer > Validating tipset: EPOCH = 1165085
2022-08-05T13:42:15.905Z INFO forest::daemon > Forest finish shutdown.
Segmentation fault (core dumped)
I'm seing this when calling dmesg:
[ 963.574597] process '/cgHI5jEa/dummy' started with executable stack
[ 5546.850740] show_signal_msg: 147 callbacks suppressed
[ 5546.850744] rocksdb:low[160415]: segfault at 556cdc038e00 ip 0000556cdc038e00 sp 00007f2eeeffa328 error 15 in forest[556cdbd80000+a90000]
[ 5546.852277] Code: 6e 66 69 67 75 72 61 62 6c 65 45 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <4e> 37 72 6f 63 6b 73 64 62 32 31 44 42 4f 70 74 69 6f 6e 73 43 6f
[ 5628.894060] rocksdb:low[160491]: segfault at 7f885546eea0 ip 00007f885546eea0 sp 00007f8851039e18 error 15 in libstdc++.so.6.0.30[7f885546d000+d000]
[ 5628.895183] Code: 00 00 30 fb 46 55 88 7f 00 00 60 54 3f 55 88 7f 00 00 78 fb 46 55 88 7f 00 00 00 00 00 00 00 00 00 00 78 ee 46 55 88 7f 00 00 <00> 6d 2f 55 88 7f 00 00 20 6d 2f 55 88 7f 00 00 c0 94 2f 55 88 7f
One lead could be some SIGTERM, SIGINT interference between wasmtime/rocksdb. There's an interesting thread from NEAR here.
I can reproduce it in a much much more consistent way when I hit ctrc-c just after rpc ready message:
# ./target/release/forest --chain=calibnet --target-peer-count 50 --encrypt-keystore false
2022-08-08T15:50:31.281Z WARN forest::cli > No configurations found, using defaults.
2022-08-08T15:50:31.281Z INFO forest::daemon > Starting Forest daemon, version v0.2.2/unstable/2777962b
2022-08-08T15:50:31.281Z INFO forest_libp2p::service > Recovered libp2p keypair from "/root/.local/share/forest/libp2p/keypair"
2022-08-08T15:50:31.281Z WARN forest::daemon > Warning: Keystore encryption disabled!
2022-08-08T15:50:31.281Z INFO forest::daemon > Prometheus server started at 127.0.0.1:6116
2022-08-08T15:50:31.281Z INFO forest::daemon > Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXX0.7lPjSuwVsOC0xDsrIp_a8eOQM2kNcaQdKx2LSun1p_E
2022-08-08T15:50:31.351Z INFO genesis > Initialized genesis: BlockHeader: Cid(bafy2bzacecz3trtejxtzix4f4eebs7dekm6snfsmvffiqz2rfx7iwgsgtieq4)
2022-08-08T15:50:31.352Z INFO forest::daemon > Using network :: calibrationnet
2022-08-08T15:50:31.715Z INFO forest::daemon > block_until_sigint
2022-08-08T15:50:31.716Z INFO forest::daemon > JSON-RPC endpoint started at 127.0.0.1:1234
2022-08-08T15:50:31.716Z INFO rpc > Ready for RPC connections
^C 2022-08-08T15:50:32.329Z WARN forest::cli > Got interrupt, shutting down...
2022-08-08T15:50:32.330Z INFO utils > Permissions set to 0600 on File { fd: 45, path: "/root/.local/share/forest/keystore.json", read: false, write: true }
2022-08-08T15:50:32.330Z INFO forest::daemon > Forest finish shutdown.
Segmentation fault (core dumped)