nethermind icon indicating copy to clipboard operation
nethermind copied to clipboard

segfault error 4 in librocksdb.so while shutdown in progress

Open c0deright opened this issue 3 years ago • 1 comments
trafficstars

Describe the bug While shutting down nethermind there was a segfault observed (race condition?).

To Reproduce (happened only once in over 30 restarts in the last several days on several nodes)

  1. stop nethermind
  2. observe kernel log

Expected behavior A clean shutdown without Segfaults

Desktop (please complete the following information):

  • OS: Ubuntu Focal 20.04 LTS
  • Nethermind 1.13.3
  • rocksdb

Syslog

Jun 22 16:11:02 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:02.5679|Processed   15007992 |      289ms of   9,198ms, mgasps   67.86 total   34.25, tps  830.51 total  421.78, bps    3.46 total    1.09, recv queue 0, proc queue 0
Jun 22 16:11:12 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:12.2548|Peers | with known best block: 7 | all: 16 |
Jun 22 16:11:26 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:26.2867|Processed   15007993 |      998ms of  23,719ms, mgasps   24.42 total   34.24, tps  375.82 total  421.71, bps    1.00 total    1.09, recv queue 0, proc queue 0
Jun 22 16:11:27 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:27.2552|Peers | with known best block: 7 | all: 16 |
Jun 22 16:11:27 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:27.2552|Sync peers 7(16)/15
Jun 22 16:11:30 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:30.7804|Processed   15007994 |      496ms of   4,494ms, mgasps   65.85 total   34.26, tps  892.01 total  422.09, bps    2.02 total    1.09, recv queue 0, proc queue 0
Jun 22 16:11:42 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:42.2587|Peers | with known best block: 7 | all: 16 |
Jun 22 16:11:45 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:45.1953|Processed   15007996 |    1,027ms of  14,415ms, mgasps   32.72 total   34.26, tps  433.38 total  422.11, bps    1.95 total    1.09, recv queue 0, proc queue 0
Jun 22 16:11:57 node1 Nethermind.Runner[4743]: 2022-06-22 16:11:57.2607|Peers | with known best block: 7 | all: 16 |
Jun 22 16:12:02 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:02.1883|Processed   15007997 |      885ms of  16,993ms, mgasps   41.53 total   34.27, tps  568.20 total  422.31, bps    1.13 total    1.09, recv queue 0, proc queue 0
Jun 22 16:12:06 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:06.3971|Processed   15007998 |      118ms of   4,209ms, mgasps  211.53 total   34.30, tps 2842.18 total  422.77, bps    8.46 total    1.10, recv queue 0, proc queue 0
Jun 22 16:12:08 node1 systemd[1]: Stopping Nethermind Node...
Jun 22 16:12:08 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:08.8036|Closing, please wait until all functions are stopped properly...
Jun 22 16:12:08 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:08.8110|Persisting trie...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4405|Stopping session monitor...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4423|Stopping discovery app...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4475|Stopping discovery udp channel
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4475|Stopping block producer...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4475|Stopping sync peer pool...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Stopping peer pool...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Peer Pool shutdown complete.. please wait for all components to close
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Stopping peer manager...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Discovery shutdown complete.. please wait for all components to close
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Peer Manager shutdown complete.. please wait for all components to close
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4506|Stopping synchronizer...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Peer update loop canceled
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|State sync task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Fast sync blocks downloader task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Fast blocks bodies task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Stopping blockchain processor...
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Fast blocks receipts task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Full sync block downloader task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4547|Fast blocks headers task completed.
Jun 22 16:12:10 node1 Nethermind.Runner[4743]: 2022-06-22 16:12:10.4609|Stopping rlpx peer...
Jun 22 16:12:10 node1 kernel: [ 3822.871420] show_signal_msg: 57 callbacks suppressed
Jun 22 16:12:10 node1 kernel: [ 3822.871423] .NET Long Runni[5025]: segfault at 750000000a ip 00007f3649a29094 sp 00007f3600ff7b58 error 4 in librocksdb.so[7f36496bf000+70a000]
Jun 22 16:12:10 node1 kernel: [ 3822.871435] Code: 12 e4 e4 ff 66 90 0f b6 47 30 c3 90 66 2e 0f 1f 84 00 00 00 00 00 c6 47 30 01 c3 90 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 38 <8b> 40 08 c3 0f 1f 84 00 00 00 00 00 e9 db 31 e4 ff 90 66 2e 0f 1f
Jun 22 16:12:11 node1 systemd[1]: nethermind.service: Main process exited, code=killed, status=11/SEGV
Jun 22 16:12:11 node1 systemd[1]: nethermind.service: Failed with result 'signal'.
Jun 22 16:12:11 node1 systemd[1]: Stopped Nethermind Node.

Additional information

% dpkg -l | grep -P 'snappy|rocksdb'
ii  librocksdb5.17                     5.17.2-3                          amd64        persistent Key-Value Store for Flash and RAM Storage
ii  libsnappy-dev:amd64                1.1.8-1build1                     amd64        fast compression/decompression library (development files)
ii  libsnappy1v5:amd64                 1.1.8-1build1                     amd64        fast compression/decompression library
% cat /etc/systemd/system/nethermind.service
[Unit]
Description=Nethermind Node
Documentation=https://docs.nethermind.io
After=network.target

[Service]
User=nethermind
Group=nethermind

WorkingDirectory=/home/nethermind/.nm-data
EnvironmentFile=/home/nethermind/.nm-data/.env
ExecStart=/home/nethermind/nethermind-client/Nethermind.Runner --datadir /home/nethermind/.nm-data
LimitNOFILE=1000000

Restart=always
PrivateTmp=true
TimeoutStopSec=900s
TimeoutStartSec=3s
StartLimitInterval=120s
StartLimitBurst=5

[Install]
WantedBy=default.target
% cat /home/nethermind/.nm-data/.env
NETHERMIND_CONFIG="mainnet"
NETHERMIND_JSONRPCCONFIG_ENABLED="true"
NETHERMIND_JSONRPCCONFIG_HOST="0.0.0.0"
NETHERMIND_NETWORKCONFIG_MAXACTIVEPEERS="15"
NETHERMIND_INITCONFIG_LOGFILENAME="log.txt"
#
# Special
#
NETHERMIND_NETWORKCONFIG_P2PPORT="30304"
NETHERMIND_NETWORKCONFIG_DISCOVERYPORT="30304"
NETHERMIND_JSONRPCCONFIG_PORT=8546

c0deright avatar Jun 22 '22 15:06 c0deright

Noticed similar error but in libpthread-2.27.so on Ubuntu 18.04 too:

[14953629.035999] .NET ThreadPool[4069491]: segfault at 18 ip 00007f6d54d03892 sp 00007f64acdfbf40 error 4 in libpthread-2.27.so[7f6d54cf7000+1a000]
[14953629.036156] Code: ff 83 f8 92 0f 85 3e ff ff ff eb cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 90 <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 74 6f 8b 07 89 c1 89 c2 83
[14953930.360088] SingleThreadEve[4078154]: segfault at 18 ip 00007f4286a18892 sp 00007f3a17fce2d0 error 4 in libpthread-2.27.so[7f4286a0c000+1a000]
[14953930.360122] SingleThreadEve[4078155]: segfault at 18 ip 00007f4286a18892 sp 00007f3a177cd2d0 error 4 in libpthread-2.27.so[7f4286a0c000+1a000]
[14953930.360267] Code: ff 83 f8 92 0f 85 3e ff ff ff eb cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 90 <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 74 6f 8b 07 89 c1 89 c2 83
[14953930.360412] Code: ff 83 f8 92 0f 85 3e ff ff ff eb cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 90 <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 74 6f 8b 07 89 c1 89 c2 83
[14954129.972989] SingleThreadEve[4080910]: segfault at 18 ip 00007f0dda8cb892 sp 00007f05687c72d0 error 4 in libpthread-2.27.so[7f0dda8bf000+1a000]
[14954129.973148] Code: ff 83 f8 92 0f 85 3e ff ff ff eb cc 66 2e 0f 1f 84 00 00 00 00 00 66 90 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 90 <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 74 6f 8b 07 89 c1 89 c2 83

gituser avatar Jun 22 '22 16:06 gituser

@gituser @c0deright, We updated RocksDb a few times. Please try the latest version of Nethermind. I'm closing but if it is still valid feel free to reopen.

MarekM25 avatar Aug 24 '23 15:08 MarekM25