nethermind
nethermind copied to clipboard
segfault error 4 in librocksdb.so while shutdown in progress
trafficstars
Describe the bug While shutting down nethermind there was a segfault observed (race condition?).
To Reproduce (happened only once in over 30 restarts in the last several days on several nodes)
- stop nethermind
- observe kernel log
Expected behavior A clean shutdown without Segfaults
Desktop (please complete the following information):
- OS: Ubuntu Focal 20.04 LTS
- Nethermind 1.13.3
- rocksdb
Syslog
Jun 22 16:11:02 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:02.5679|Processed 15007992 | 289ms of 9,198ms, mgasps 67.86 total 34.25, tps 830.51 total 421.78, bps 3.46 total 1.09, recv queue 0, proc queue 0
Jun 22 16:11:12 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:12.2548|Peers | with known best block: 7 | all: 16 |
Jun 22 16:11:26 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:26.2867|Processed 15007993 | 998ms of 23,719ms, mgasps 24.42 total 34.24, tps 375.82 total 421.71, bps 1.00 total 1.09, recv queue 0, proc queue 0
Jun 22 16:11:27 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:27.2552|Peers | with known best block: 7 | all: 16 |
Jun 22 16:11:27 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:27.2552|Sync peers 7(16)/15
Jun 22 16:11:30 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:30.7804|Processed 15007994 | 496ms of 4,494ms, mgasps 65.85 total 34.26, tps 892.01 total 422.09, bps 2.02 total 1.09, recv queue 0, proc queue 0
Jun 22 16:11:42 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:42.2587|Peers | with known best block: 7 | all: 16 |
Jun 22 16:11:45 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:45.1953|Processed 15007996 | 1,027ms of 14,415ms, mgasps 32.72 total 34.26, tps 433.38 total 422.11, bps 1.95 total 1.09, recv queue 0, proc queue 0
Jun 22 16:11:57 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:57.2607|Peers | with known best block: 7 | all: 16 |
Jun 22 16:12:02 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:02.1883|Processed 15007997 | 885ms of 16,993ms, mgasps 41.53 total 34.27, tps 568.20 total 422.31, bps 1.13 total 1.09, recv queue 0, proc queue 0
Jun 22 16:12:06 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:06.3971|Processed 15007998 | 118ms of 4,209ms, mgasps 211.53 total 34.30, tps 2842.18 total 422.77, bps 8.46 total 1.10, recv queue 0, proc queue 0
Jun 22 16:12:08 node1 systemd[1]: Stopping Nethermind Node...
Jun 22 16:12:08 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:08.8036|Closing, please wait until all functions are stopped properly...
Jun 22 16:12:08 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:08.8110|Persisting trie...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4405|Stopping session monitor...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4423|Stopping discovery app...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4475|Stopping discovery udp channel
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4475|Stopping block producer...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4475|Stopping sync peer pool...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Stopping peer pool...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Peer Pool shutdown complete.. please wait for all components to close
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Stopping peer manager...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Discovery shutdown complete.. please wait for all components to close
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Peer Manager shutdown complete.. please wait for all components to close
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Stopping synchronizer...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Peer update loop canceled
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|State sync task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Fast sync blocks downloader task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Fast blocks bodies task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Stopping blockchain processor...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Fast blocks receipts task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Full sync block downloader task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Fast blocks headers task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4609|Stopping rlpx peer...
Jun 22 16:12:10 node1 kernel: [ 3822.871420] show_signal_msg: 57 callbacks suppressed
Jun 22 16:12:10 node1 kernel: [ 3822.871423] .NET Long Runni[5025]: segfault at 750000000a ip 00007f3649a29094 sp 00007f3600ff7b58 error 4 in librocksdb.so[7f36496bf000+70a000]
Jun 22 16:12:10 node1 kernel: [ 3822.871435] Code: 12 e4 e4 ff 66 90 0f b6 47 30 c3 90 66 2e 0f 1f 84 00 00 00 00 00 c6 47 30 01 c3 90 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 38 <8b> 40 08 c3 0f 1f 84 00 00 00 00 00 e9 db 31 e4 ff 90 66 2e 0f 1f
Jun 22 16:12:11 node1 systemd[1]: nethermind.service: Main process exited, code=killed, status=11/SEGV
Jun 22 16:12:11 node1 systemd[1]: nethermind.service: Failed with result 'signal'.
Jun 22 16:12:11 node1 systemd[1]: Stopped Nethermind Node.
Additional information
% dpkg -l | grep -P 'snappy|rocksdb'
ii librocksdb5.17 5.17.2-3 amd64 persistent Key-Value Store for Flash and RAM Storage
ii libsnappy-dev:amd64 1.1.8-1build1 amd64 fast compression/decompression library (development files)
ii libsnappy1v5:amd64 1.1.8-1build1 amd64 fast compression/decompression library
% cat /etc/systemd/system/nethermind.service
[Unit]
Description=Nethermind Node
Documentation=https://docs.nethermind.io
After=network.target
[Service]
User=nethermind
Group=nethermind
WorkingDirectory=/home/nethermind/.nm-data
EnvironmentFile=/home/nethermind/.nm-data/.env
ExecStart=/home/nethermind/nethermind-client/Nethermind.Runner --datadir /home/nethermind/.nm-data
LimitNOFILE=1000000
Restart=always
PrivateTmp=true
TimeoutStopSec=900s
TimeoutStartSec=3s
StartLimitInterval=120s
StartLimitBurst=5
[Install]
WantedBy=default.target
% cat /home/nethermind/.nm-data/.env
NETHERMIND_CONFIG="mainnet"
NETHERMIND_JSONRPCCONFIG_ENABLED="true"
NETHERMIND_JSONRPCCONFIG_HOST="0.0.0.0"
NETHERMIND_NETWORKCONFIG_MAXACTIVEPEERS="15"
NETHERMIND_INITCONFIG_LOGFILENAME="log.txt"
#
# Special
#
NETHERMIND_NETWORKCONFIG_P2PPORT="30304"
NETHERMIND_NETWORKCONFIG_DISCOVERYPORT="30304"
NETHERMIND_JSONRPCCONFIG_PORT=8546
Noticed similar error but in libpthread-2.27.so on Ubuntu 18.04 too:
[14953629.035999] .NET ThreadPool[4069491]: segfault at 18 ip 00007f6d54d03892 sp 00007f64acdfbf40 error 4 in libpthread-2.27.so[7f6d54cf7000+1a000]
[14953629.036156] Code: ff 83 f8 92 0f 85 3e ff ff ff eb cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 90 <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 74 6f 8b 07 89 c1 89 c2 83
[14953930.360088] SingleThreadEve[4078154]: segfault at 18 ip 00007f4286a18892 sp 00007f3a17fce2d0 error 4 in libpthread-2.27.so[7f4286a0c000+1a000]
[14953930.360122] SingleThreadEve[4078155]: segfault at 18 ip 00007f4286a18892 sp 00007f3a177cd2d0 error 4 in libpthread-2.27.so[7f4286a0c000+1a000]
[14953930.360267] Code: ff 83 f8 92 0f 85 3e ff ff ff eb cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 90 <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 74 6f 8b 07 89 c1 89 c2 83
[14953930.360412] Code: ff 83 f8 92 0f 85 3e ff ff ff eb cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 90 <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 74 6f 8b 07 89 c1 89 c2 83
[14954129.972989] SingleThreadEve[4080910]: segfault at 18 ip 00007f0dda8cb892 sp 00007f05687c72d0 error 4 in libpthread-2.27.so[7f0dda8bf000+1a000]
[14954129.973148] Code: ff 83 f8 92 0f 85 3e ff ff ff eb cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 90 <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 74 6f 8b 07 89 c1 89 c2 83
@gituser @c0deright, We updated RocksDb a few times. Please try the latest version of Nethermind. I'm closing but if it is still valid feel free to reopen.