forest icon indicating copy to clipboard operation
forest copied to clipboard

Terminate Forest gracefully

Open LesnyRumcajs opened this issue 2 years ago • 13 comments

Describe the bug

When sending SIGTERM (ctrl + c) Forest should gracefully shut down. Seems the signal is handled right now but not really well. We should investigate it.

To Reproduce Steps to reproduce the behaviour:

  1. Run forest, e.g. target/release/forest --target-peer-count 50 --encrypt-keystore false
  2. Terminate it with ctrl + c
  3. See the error.

Log output

On Fedora 35 with coredumps enabled (e.g. ulimit -S -c unlimited)

 2022-06-28T10:41:39.747Z INFO  rpc                    > Ready for RPC connections
^C 2022-06-28T10:41:40.105Z WARN  forest::cli            > Got interrupt, shutting down...
 2022-06-28T10:41:40.106Z INFO  utils                  > Permissions set to 0600 on File { fd: 178, path: "/home/rumcajs-work/.forest/keystore.json", read: false, write: true }
 2022-06-28T10:41:40.108Z INFO  forest::daemon         > Forest finish shutdown.
fish: Job 1, 'target/release/forest --target-…' terminated by signal SIGSEGV (Address boundary error)

On Ubuntu 20.04 LTS

 2022-06-28T10:37:26.136Z INFO  rpc                    > Ready for RPC connections
^C 2022-06-28T10:37:27.372Z WARN  forest::cli            > Got interrupt, shutting down...
 2022-06-28T10:37:27.372Z INFO  utils                  > Permissions set to 0600 on File { fd: 37, path: "/home/rumcajs/.forest/keystore.json", read: false, write: true }
 2022-06-28T10:37:27.390Z INFO  chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1936320 -> NETWORK_HEAD = 1936874
Downloading headers 0 / 554 [--------------------------------------------------------------------------------------------------------------------------------------------------] 0.00 % 0.00/s
 2022-06-28T10:37:27.494Z INFO  forest::daemon          > Forest finish shutdown.
Segmentation fault (core dumped)

Expected behaviour

No errors should occur after Forest is terminated and coredump should not be dumped.

Screenshots

Environment (please complete the following information):

  • OS: Fedora 35, Ubuntu 20.04
  • Rust version rustc 1.62.0-nightly (cb1219871 2022-05-08)
  • Branch/commit main 1ef1c1a09b6d572ba01fb64cffaec10c230d6c14

Other information and links

LesnyRumcajs avatar Jun 28 '22 11:06 LesnyRumcajs

This started happening after we switched to the FVM. Maybe something in the WASM jit engine isn't being shutdown correctly.

lemmih avatar Jun 28 '22 11:06 lemmih

Went through steps 1 and 2 on Mac. Not seeing the issue

tyshko5 avatar Jul 21 '22 13:07 tyshko5

Running forest requires too much dark magic, at the moment. It'll get better once we have our own snapshots and #1669 is implemented.

lemmih avatar Jul 21 '22 13:07 lemmih

More context: This happens occasionally while syncing from a snapshot but rarely when the sync has completed.

lemmih avatar Jul 21 '22 13:07 lemmih

Did you reproduced it as well on calibnet?

elmattic avatar Aug 04 '22 09:08 elmattic

@elmattic yes, it's pretty consistent on calibnet on ctrl-c during headers download. Tested on Fedora 36, didn't have to set the setting above.

forest on  main [$?] via 🦀 v1.62.0-nightly took 1m8s
❯ target/release/forest --encrypt-keystore false --target-peer-count 50 --chain calibnet
 2022-08-04T09:50:15.964Z WARN  forest::cli > No configurations found, using defaults.
 2022-08-04T09:50:15.964Z INFO  forest::daemon > Starting Forest daemon, version v0.2.2/unstable/2777962b
 2022-08-04T09:50:15.964Z INFO  forest_libp2p::service > Recovered libp2p keypair from "/home/rumcajs-work/.local/share/forest/libp2p/keypair"
 2022-08-04T09:50:15.964Z WARN  forest::daemon         > Warning: Keystore encryption disabled!
 2022-08-04T09:50:15.964Z INFO  forest::daemon         > Prometheus server started at 127.0.0.1:6116
 2022-08-04T09:50:15.964Z INFO  forest::daemon         > Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXX0.8IqfOBc_l9awxLRg7K6fzePe6jhkpalfrpxKDUqvljw
 2022-08-04T09:50:16.330Z INFO  genesis                > Initialized genesis: BlockHeader: Cid(bafy2bzacecz3trtejxtzix4f4eebs7dekm6snfsmvffiqz2rfx7iwgsgtieq4)
 2022-08-04T09:50:16.330Z INFO  forest::daemon         > Using network :: calibrationnet
 2022-08-04T09:50:16.629Z INFO  forest::daemon         > JSON-RPC endpoint started at 127.0.0.1:1234
 2022-08-04T09:50:16.629Z INFO  rpc                    > Ready for RPC connections
 2022-08-04T09:50:18.301Z INFO  chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1087290 -> NETWORK_HEAD = 1184860
Downloading headers 100 / 97570 [>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 0.10 % 98.82/s 16m ^C 2022-08-04T09:50:19.777Z WARN  forest::cli             > Got interrupt, shutting down...
 2022-08-04T09:50:19.777Z INFO  utils                   > Permissions set to 0600 on File { fd: 6, path: "/home/rumcajs-work/.local/share/forest/keystore.json", read: false, write: true }
 2022-08-04T09:50:19.778Z INFO  forest::daemon          > Forest finish shutdown.
fish: Job 1, 'target/release/forest --encrypt…' terminated by signal SIGSEGV (Address boundary error)




forest on  main [$?] via 🦀 v1.62.0-nightly took 4s
❯ target/release/forest --encrypt-keystore false --target-peer-count 50 --chain calibnet
 2022-08-04T09:51:24.011Z WARN  forest::cli > No configurations found, using defaults.
 2022-08-04T09:51:24.012Z INFO  forest::daemon > Starting Forest daemon, version v0.2.2/unstable/2777962b
 2022-08-04T09:51:24.012Z INFO  forest_libp2p::service > Recovered libp2p keypair from "/home/rumcajs-work/.local/share/forest/libp2p/keypair"
 2022-08-04T09:51:24.012Z WARN  forest::daemon         > Warning: Keystore encryption disabled!
 2022-08-04T09:51:24.012Z INFO  forest::daemon         > Prometheus server started at 127.0.0.1:6116
 2022-08-04T09:51:24.012Z INFO  forest::daemon         > Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXX0.8IqfOBc_l9awxLRg7K6fzePe6jhkpalfrpxKDUqvljw
 2022-08-04T09:51:24.050Z INFO  genesis                > Initialized genesis: BlockHeader: Cid(bafy2bzacecz3trtejxtzix4f4eebs7dekm6snfsmvffiqz2rfx7iwgsgtieq4)
 2022-08-04T09:51:24.050Z INFO  forest::daemon         > Using network :: calibrationnet
 2022-08-04T09:51:24.292Z INFO  forest::daemon         > JSON-RPC endpoint started at 127.0.0.1:1234
 2022-08-04T09:51:24.292Z INFO  rpc                    > Ready for RPC connections
 2022-08-04T09:51:26.198Z INFO  chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1087290 -> NETWORK_HEAD = 1184862
Downloading headers 0 / 97572 [---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 0.00 % 0.00/s ^C 2022-08-04T09:51:26.769Z WARN  forest::cli             > Got interrupt, shutting down...
 2022-08-04T09:51:26.769Z INFO  utils                   > Permissions set to 0600 on File { fd: 6, path: "/home/rumcajs-work/.local/share/forest/keystore.json", read: false, write: true }
 2022-08-04T09:51:26.770Z INFO  forest::daemon          > Forest finish shutdown.
fish: Job 1, 'target/release/forest --encrypt…' terminated by signal SIGSEGV (Address boundary error)




forest on  main [$?] via 🦀 v1.62.0-nightly took 3s
❯ target/release/forest --encrypt-keystore false --target-peer-count 50 --chain calibnet
 2022-08-04T09:51:30.870Z WARN  forest::cli > No configurations found, using defaults.
 2022-08-04T09:51:30.870Z INFO  forest::daemon > Starting Forest daemon, version v0.2.2/unstable/2777962b
 2022-08-04T09:51:30.870Z INFO  forest_libp2p::service > Recovered libp2p keypair from "/home/rumcajs-work/.local/share/forest/libp2p/keypair"
 2022-08-04T09:51:30.870Z WARN  forest::daemon         > Warning: Keystore encryption disabled!
 2022-08-04T09:51:30.870Z INFO  forest::daemon         > Prometheus server started at 127.0.0.1:6116
 2022-08-04T09:51:30.870Z INFO  forest::daemon         > Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXX0.8IqfOBc_l9awxLRg7K6fzePe6jhkpalfrpxKDUqvljw
 2022-08-04T09:51:30.896Z INFO  genesis                > Initialized genesis: BlockHeader: Cid(bafy2bzacecz3trtejxtzix4f4eebs7dekm6snfsmvffiqz2rfx7iwgsgtieq4)
 2022-08-04T09:51:30.897Z INFO  forest::daemon         > Using network :: calibrationnet
 2022-08-04T09:51:31.140Z INFO  forest::daemon         > JSON-RPC endpoint started at 127.0.0.1:1234
 2022-08-04T09:51:31.140Z INFO  rpc                    > Ready for RPC connections
 2022-08-04T09:51:33.862Z INFO  chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1087290 -> NETWORK_HEAD = 1184863
Downloading headers 202 / 97573 [>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 0.21 % 64.33/s 25m ^C 2022-08-04T09:51:37.625Z WARN  forest::cli             > Got interrupt, shutting down...
 2022-08-04T09:51:37.625Z INFO  utils                   > Permissions set to 0600 on File { fd: 147, path: "/home/rumcajs-work/.local/share/forest/keystore.json", read: false, write: true }
 2022-08-04T09:51:37.627Z INFO  forest::daemon          > Forest finish shutdown.
fish: Job 1, 'target/release/forest --encrypt…' terminated by signal SIGSEGV (Address boundary error)

LesnyRumcajs avatar Aug 04 '22 09:08 LesnyRumcajs

@LesnyRumcajs What exact commit are you using to reproduce it? I feel I'm on the wrong commit (HEAD of main).

elmattic avatar Aug 04 '22 10:08 elmattic

@elmattic it's the latest main for me, i.e. 2777962be6ef601bac10b7722489163f83d2c087

LesnyRumcajs avatar Aug 04 '22 10:08 LesnyRumcajs

I don't get SIGSEGV while downloading headers. I /do/ get SIGSEGV if I hit ctrl-c when "Scanning blockchain" is shown. It's 100% consistent for me.

lemmih avatar Aug 04 '22 10:08 lemmih

Still unable to reproduce on Fedora 36.

elmattic avatar Aug 05 '22 13:08 elmattic

Oh, manage to reproduce now:

 2022-08-05T13:42:13.681Z INFO  chain_sync::chain_muxer > Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1165085 -> NETWORK_HEAD = 1188204
Downloading headers 23119 / 23119 [===================================================================================================================================================================================================] 100.00 % 14806.30/s ^C 2022-08-05T13:42:15.437Z WARN  forest::cli             > Got interrupt, shutting down...
 2022-08-05T13:42:15.438Z INFO  utils                   > Permissions set to 0600 on File { fd: 114, path: "/root/.local/share/forest/keystore.json", read: false, write: true }
 2022-08-05T13:42:15.813Z INFO  chain_sync::tipset_syncer > Validating tipset: EPOCH = 1165085
 2022-08-05T13:42:15.905Z INFO  forest::daemon            > Forest finish shutdown.
Segmentation fault (core dumped)

elmattic avatar Aug 05 '22 13:08 elmattic

I'm seing this when calling dmesg:

[  963.574597] process '/cgHI5jEa/dummy' started with executable stack
[ 5546.850740] show_signal_msg: 147 callbacks suppressed
[ 5546.850744] rocksdb:low[160415]: segfault at 556cdc038e00 ip 0000556cdc038e00 sp 00007f2eeeffa328 error 15 in forest[556cdbd80000+a90000]
[ 5546.852277] Code: 6e 66 69 67 75 72 61 62 6c 65 45 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <4e> 37 72 6f 63 6b 73 64 62 32 31 44 42 4f 70 74 69 6f 6e 73 43 6f
[ 5628.894060] rocksdb:low[160491]: segfault at 7f885546eea0 ip 00007f885546eea0 sp 00007f8851039e18 error 15 in libstdc++.so.6.0.30[7f885546d000+d000]
[ 5628.895183] Code: 00 00 30 fb 46 55 88 7f 00 00 60 54 3f 55 88 7f 00 00 78 fb 46 55 88 7f 00 00 00 00 00 00 00 00 00 00 78 ee 46 55 88 7f 00 00 <00> 6d 2f 55 88 7f 00 00 20 6d 2f 55 88 7f 00 00 c0 94 2f 55 88 7f

One lead could be some SIGTERM, SIGINT interference between wasmtime/rocksdb. There's an interesting thread from NEAR here.

elmattic avatar Aug 05 '22 13:08 elmattic

I can reproduce it in a much much more consistent way when I hit ctrc-c just after rpc ready message:

# ./target/release/forest --chain=calibnet --target-peer-count 50 --encrypt-keystore false
 2022-08-08T15:50:31.281Z WARN  forest::cli > No configurations found, using defaults.
 2022-08-08T15:50:31.281Z INFO  forest::daemon > Starting Forest daemon, version v0.2.2/unstable/2777962b
 2022-08-08T15:50:31.281Z INFO  forest_libp2p::service > Recovered libp2p keypair from "/root/.local/share/forest/libp2p/keypair"
 2022-08-08T15:50:31.281Z WARN  forest::daemon         > Warning: Keystore encryption disabled!
 2022-08-08T15:50:31.281Z INFO  forest::daemon         > Prometheus server started at 127.0.0.1:6116
 2022-08-08T15:50:31.281Z INFO  forest::daemon         > Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXX0.7lPjSuwVsOC0xDsrIp_a8eOQM2kNcaQdKx2LSun1p_E
 2022-08-08T15:50:31.351Z INFO  genesis                > Initialized genesis: BlockHeader: Cid(bafy2bzacecz3trtejxtzix4f4eebs7dekm6snfsmvffiqz2rfx7iwgsgtieq4)
 2022-08-08T15:50:31.352Z INFO  forest::daemon         > Using network :: calibrationnet
 2022-08-08T15:50:31.715Z INFO  forest::daemon         > block_until_sigint
 2022-08-08T15:50:31.716Z INFO  forest::daemon         > JSON-RPC endpoint started at 127.0.0.1:1234
 2022-08-08T15:50:31.716Z INFO  rpc                    > Ready for RPC connections
^C 2022-08-08T15:50:32.329Z WARN  forest::cli            > Got interrupt, shutting down...
 2022-08-08T15:50:32.330Z INFO  utils                  > Permissions set to 0600 on File { fd: 45, path: "/root/.local/share/forest/keystore.json", read: false, write: true }
 2022-08-08T15:50:32.330Z INFO  forest::daemon         > Forest finish shutdown.
Segmentation fault (core dumped)

elmattic avatar Aug 08 '22 15:08 elmattic