server
server copied to clipboard
mssql repeatedly crashing under kernel 6.7.0
Steps To Reproduce
I wondered why something else on my home server was suddenly failing this morning. This turned out to be because /home was full. The cause of that was bitwarden mssql container repeatedly crashing and filling the partition with crash dumps/logs.
This only started after I reboot this morning to a 6.7.0 kernel. It's fine back on a 6.6.8 kernel now with the original database files.
It's perfectly possible I made a bad decision on some new 6.7.0 kernel config option.
Expected Result
The mssql container to continue running.
Actual Result
The mssql container repeatedly crashes.
I would have tried restoring a database backup, but couldn't (easily) do that without the container running.
I also tried backing up the live mssql/data directory and then deleting its contents and starting again, same sort of crash resulted.
Screenshots or Videos
No response
Additional Context
This is with a self-compiled 6.7.0 kernel.
Here's the .config for the compile. I couldn't immediately spot any culprit when diffing this with my v6.6.8 .config:
kernel-config-v6.7.0.txt
And the 6.6.8 .config for comparison:
kernel-config-v6.6.8.txt
A crash.json file from one of the dump directories:
{
"reason": "0x00000001",
"processName": "sqlservr",
"pid": "42",
"instanceId": "dd8cd82b-efd7-43f1-be37-7bfb1811ece1",
"crashId": "80680a56-1f30-489d-857b-3f8dd7dcbb66",
"threadState": "0x00007fbe0fcb56a8",
"threadId": "157",
"libosThreadId": "0x1e8",
"buildStamp": "e5dea205d0938e2848fb2509856a7e8f30783e6d5f62d0c88355e288de0db89a",
"buildNum": "212470",
"signal": "6",
"signalText": "SIGABRT",
"stack": [
"0x0000558630379dd1",
"0x00005586303784f0",
"0x0000558630377af1",
"0x00007fbe142cb090",
"0x00007fbe142cb00b",
"0x00007fbe142aa859",
"0x00005586302ef692"
],
"stackText": [
"<unknown>",
"<unknown>",
"<unknown>",
"killpg+0x40",
"gsignal+0xcb",
"abort+0x12b",
"<unknown>"
],
"last_errno": "2",
"last_errno_text": "No such file or directory",
"distribution": "Ubuntu 20.04.4 LTS",
"processors": "8",
"total_memory": "33547169792",
"timestamp": "Sat Jan 20 11:14:16 2024"
}
Without the core dump, which is 1.9 GiB, here's one of the dump directories: core.sqlservr.01_20_2024_11_14_17.42.d-NO_CORE_DUMP.tar.gz
Build Version
2024.1.1
Environment
Self-Hosted
Environment Details
OS: Debian 12.4 (bookworm/stable)
Kernel: 6.7.0 - compiled myself, see above for the .config
bitwarden.sh version 2024.1.1
Docker version 25.0.0, build e758fe5
Docker Compose version v2.24.1
Issue Tracking Info
- [X] I understand that work is tracked outside of Github. A PR will be linked to this issue should one be opened to address it, but Bitwarden doesn't use fields like "assigned", "milestone", or "project" to track progress.
And no sooner do I open this than I see that kernel 6.7.1 is now released. I'll try that soon to see if it helps.
The issue is still present on 6.7.1, but not on 6.6.13 (I figured I might as well end up on the latest 6.6.x if needs be).
Same issue. I am using Arch as a host and I downgraded to 6.6.7 and it is working again. This is not specific to BitWarden as it is also reported on the AUR for the mssql package. https://aur.archlinux.org/packages/mssql-server
Same issue. I am using Arch as a host and I downgraded to 6.6.7 and it is working again. This is not specific to BitWarden as it is also reported on the AUR for the mssql package. aur.archlinux.org/packages/mssql-server
Phew, not just me then. I figured it was probably an issue specific to mssql, but chose to report here as it could have been some combination of the newer kernel and configuration options bitwarden utilises.
Thank you for your report. It seems like an issue with mssql container with the OS/Kernel. In this case, we could only wait until the fix is available before users could upgrade to the latest kernel.
Regards,
Customer Success Team
It's BitWarden's container. So the real issue is will BitWarden file an issue with Microsoft, the LKML or both?
It seems there is already an issue opened on mssql-docker's Github
https://github.com/microsoft/mssql-docker/issues/868
The problem is still there with kernel 6.7.2
MS seems to only be fixing the current version. So will BitWarden update to SQL '22 or ...?
Bitwarden will run MS SQL 2022 starting the next release.
https://github.com/bitwarden/server/pull/3580
Unfortunately, issue persists after bitwarden/mssql updates to mssql 2022, and with Kernel v6.7.5.
OS Info:
$ uname -r
6.7.5-200.fc39.x86_64
$ cat /etc/fedora-release
Fedora release 39 (Thirty Nine)
$ docker images | grep mssql
bitwarden/mssql 2024.2.2 6b98c1e93b1c 6 days ago 1.59GB
mcr.microsoft.com/mssql/server 2019-CU24-ubuntu-20.04 b5316516906d 2 months ago 1.47GB
mcr.microsoft.com/mssql/server 2022-CU11-ubuntu-22.04 ffdd6981a89e 3 months ago 1.58GB
mcr.microsoft.com/mssql/server 2022-CU10-ubuntu-22.04 86b87ec5e60a 3 months ago 1.57GB
Running bitwarden/mssql:2024.2.2:
$ docker run --rm -it -e ACCEPT_EULA=y bitwarden/mssql:2024.2.2
This program has encountered a fatal error and cannot continue running at Tue Feb 27 00:55:28 2024
The following diagnostic information is available:
Reason: 0x00000001
Signal: SIGABRT - Aborted (6)
Stack:
IP Function
---------------- --------------------------------------
00005571445afce1 std::__1::bad_function_call::~bad_function_call()+0x96661
00005571445af6a6 std::__1::bad_function_call::~bad_function_call()+0x96026
00005571445aec2f std::__1::bad_function_call::~bad_function_call()+0x955af
00007fd817842520 __sigaction+0x50
00007fd8178969fc pthread_kill+0x12c
00007fd817842476 raise+0x16
00007fd8178287f3 abort+0xd3
0000557144580d96 std::__1::bad_function_call::~bad_function_call()+0x67716
00005571445bd5b4 std::__1::bad_function_call::~bad_function_call()+0xa3f34
00005571445eb318 std::__1::bad_function_call::~bad_function_call()+0xd1c98
00005571445eb0fa std::__1::bad_function_call::~bad_function_call()+0xd1a7a
000055714458720a std::__1::bad_function_call::~bad_function_call()+0x6db8a
0000557144586e80 std::__1::bad_function_call::~bad_function_call()+0x6d800
Process: 44 - sqlservr
Thread: 135 (application thread 0x184)
Instance Id: 056fee9f-db2a-48f9-a280-65edf3e521f7
Crash Id: 6a513c6c-ba13-4311-86a1-45c59829e555
Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
Distribution: Ubuntu 22.04.3 LTS
Processors: 6
Total Memory: 16764084224 bytes
Timestamp: Tue Feb 27 00:55:28 2024
Last errno: 2
Last errno text: No such file or directory
Capturing a dump of 44
Successfully captured dump: /var/opt/mssql/log/core.sqlservr.2_27_2024_0_55_28.44
Executing: /opt/mssql/bin/handle-crash.sh with parameters
handle-crash.sh
/opt/mssql/bin/sqlservr
44
/opt/mssql/bin
/var/opt/mssql/log/
056fee9f-db2a-48f9-a280-65edf3e521f7
6a513c6c-ba13-4311-86a1-45c59829e555
/var/opt/mssql/log/core.sqlservr.2_27_2024_0_55_28.44
Ubuntu 22.04.3 LTS
Capturing core dump and information to /var/opt/mssql/log...
/bin/cat: /proc/44/maps: Permission denied
^Ccat: /proc/44/environ: Permission denied
find: '/proc/44': No such file or directory
find: '/proc/44': No such file or directory
find: '/proc/44': No such file or directory
find: '/proc/44': No such file or directory
dmesg: read kernel buffer failed: Operation not permitted
timeout: failed to run command 'journalctl': No such file or directory
timeout: failed to run command 'journalctl': No such file or directory
Tue Feb 27 00:55:31 UTC 2024 Capturing program information
Dump already generated: /var/opt/mssql/log/core.sqlservr.2_27_2024_0_55_28.44, moving to /var/opt/mssql/log/core.sqlservr.44.temp/core.sqlservr.44.gdmp
Moving logs to /var/opt/mssql/log/core.sqlservr.44.temp/log/paldumper-debug.log
Tue Feb 27 00:55:31 UTC 2024 Capturing program binaries
Tue Feb 27 00:55:31 UTC 2024 Not compressing the dump files, moving instead to: /var/opt/mssql/log/core.sqlservr.02_27_2024_00_55_29.44.d
Running mcr.microsoft.com/mssql/server:2022-CU11-ubuntu-22.04 directly, issue seems to be upstream:
$ docker run --rm -it -e ACCEPT_EULA=y mcr.microsoft.com/mssql/server:2022-CU11-ubuntu-22.04
SQL Server 2022 will run as non-root by default.
This container is running as user mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
This program has encountered a fatal error and cannot continue running at Tue Feb 27 00:57:11 2024
The following diagnostic information is available:
Reason: 0x00000001
Signal: SIGABRT - Aborted (6)
Stack:
IP Function
---------------- --------------------------------------
0000563621faece1 std::__1::bad_function_call::~bad_function_call()+0x96661
0000563621fae6a6 std::__1::bad_function_call::~bad_function_call()+0x96026
0000563621fadc2f std::__1::bad_function_call::~bad_function_call()+0x955af
00007efee8242520 __sigaction+0x50
00007efee82969fc pthread_kill+0x12c
00007efee8242476 raise+0x16
00007efee82287f3 abort+0xd3
0000563621f7fd96 std::__1::bad_function_call::~bad_function_call()+0x67716
0000563621fbc5b4 std::__1::bad_function_call::~bad_function_call()+0xa3f34
0000563621fea318 std::__1::bad_function_call::~bad_function_call()+0xd1c98
0000563621fea0fa std::__1::bad_function_call::~bad_function_call()+0xd1a7a
0000563621f8620a std::__1::bad_function_call::~bad_function_call()+0x6db8a
0000563621f85e80 std::__1::bad_function_call::~bad_function_call()+0x6d800
Process: 9 - sqlservr
Thread: 99 (application thread 0x180)
Instance Id: 4f963587-d91e-4f3c-8eca-4e781a1c7ec9
Crash Id: aed92722-583a-4ef9-9f97-6c1a249ad28f
Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
Distribution: Ubuntu 22.04.3 LTS
Processors: 6
Total Memory: 16764084224 bytes
Timestamp: Tue Feb 27 00:57:11 2024
Last errno: 2
Last errno text: No such file or directory
Capturing a dump of 9
Successfully captured dump: /var/opt/mssql/log/core.sqlservr.2_27_2024_0_57_11.9
Executing: /opt/mssql/bin/handle-crash.sh with parameters
handle-crash.sh
/opt/mssql/bin/sqlservr
9
/opt/mssql/bin
/var/opt/mssql/log/
4f963587-d91e-4f3c-8eca-4e781a1c7ec9
aed92722-583a-4ef9-9f97-6c1a249ad28f
/var/opt/mssql/log/core.sqlservr.2_27_2024_0_57_11.9
Ubuntu 22.04.3 LTS
Capturing core dump and information to /var/opt/mssql/log...
/bin/cat: /proc/9/maps: Permission denied
^Ccat: /proc/9/environ: No such file or directory
find: '/proc/9': No such file or directory
find: '/proc/9': No such file or directory
find: '/proc/9': No such file or directory
find: '/proc/9': No such file or directory
dmesg: read kernel buffer failed: Operation not permitted
timeout: failed to run command 'journalctl': No such file or directory
timeout: failed to run command 'journalctl': No such file or directory
Tue Feb 27 00:57:13 UTC 2024 Capturing program information
Dump already generated: /var/opt/mssql/log/core.sqlservr.2_27_2024_0_57_11.9, moving to /var/opt/mssql/log/core.sqlservr.9.temp/core.sqlservr.9.gdmp
Moving logs to /var/opt/mssql/log/core.sqlservr.9.temp/log/paldumper-debug.log
Tue Feb 27 00:57:14 UTC 2024 Capturing program binaries
Tue Feb 27 00:57:14 UTC 2024 Not compressing the dump files, moving instead to: /var/opt/mssql/log/core.sqlservr.02_27_2024_00_57_12.9.d
Seems upstream is fixed: https://github.com/microsoft/mssql-docker/issues/868#issuecomment-1998288674.
$ uname -r
6.7.9-200.fc39.x86_64
$ docker pull mcr.microsoft.com/mssql/server:2022-CU12-ubuntu-22.04
$ docker run --rm -it -e ACCEPT_EULA=y mcr.microsoft.com/mssql/server:2022-CU12-ubuntu-22.04
...
2024-03-14 21:05:52.25 spid22s Using 'dbghelp.dll' version '4.0.5'
2024-03-14 21:05:52.31 spid23s Recovery is complete. This is an informational message only. No user action is required.
2024-03-14 21:05:52.43 spid31s The default language (LCID 0) has been set for engine and full-text services.
2024-03-14 21:05:52.99 spid31s The tempdb database has 6 data file(s).
^C2024-03-14 21:06:52.85 spid23s Always On: The availability replica manager is going offline because SQL Server is shutting down. This is an informational message only. No user action is required.
2024-03-14 21:06:52.86 spid23s SQL Server shutdown due to Ctrl-C or Ctrl-Break signal. This is an informational message only. No user action is required.
2024-03-14 21:06:53.87 spid23s SQL Server Agent service is not running.
2024-03-14 21:06:53.88 spid23s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.
Fingers crossed for next Bitwarden self-hosted release!