mssql-docker icon indicating copy to clipboard operation
mssql-docker copied to clipboard

Unable to start container on Linux 6.7

Open quinnjr opened this issue 1 year ago • 77 comments

Currently unable to start the container on Arch Linux as the host OS. The dump files for the failing sqlservr process don't really provide any insight as to why:

docker compose logs db
db-1  | SQL Server 2022 will run as non-root by default.
db-1  | This container is running as user mssql.
db-1  | Your master database file is owned by mssql.
db-1  | To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
db-1  | This program has encountered a fatal error and cannot continue running at Mon Jan 15 18:19:00 2024
db-1  | The following diagnostic information is available:
db-1  | 
db-1  |          Reason: 0x00000001
db-1  |          Signal: SIGABRT - Aborted (6)
db-1  |           Stack:
db-1  |                  IP               Function
db-1  |                  ---------------- --------------------------------------
db-1  |                  000064eb280a3ce1 std::__1::bad_function_call::~bad_function_call()+0x96661
db-1  |                  000064eb280a36a6 std::__1::bad_function_call::~bad_function_call()+0x96026
db-1  |                  000064eb280a2c2f std::__1::bad_function_call::~bad_function_call()+0x955af
db-1  |                  00007c18f8810520 __sigaction+0x50
db-1  |                  00007c18f88649fc pthread_kill+0x12c
db-1  |                  00007c18f8810476 raise+0x16
db-1  |                  00007c18f87f67f3 abort+0xd3
db-1  |                  000064eb28074d96 std::__1::bad_function_call::~bad_function_call()+0x67716
db-1  |                  000064eb280b15b4 std::__1::bad_function_call::~bad_function_call()+0xa3f34
db-1  |                  000064eb280df318 std::__1::bad_function_call::~bad_function_call()+0xd1c98
db-1  |                  000064eb280df0fa std::__1::bad_function_call::~bad_function_call()+0xd1a7a
db-1  |                  000064eb2807b20a std::__1::bad_function_call::~bad_function_call()+0x6db8a
db-1  |                  000064eb2807ae80 std::__1::bad_function_call::~bad_function_call()+0x6d800
db-1  |         Process: 10 - sqlservr
db-1  |          Thread: 157 (application thread 0x264)
db-1  |     Instance Id: 83ef72ce-1100-44c4-913c-45d0df61ae44
db-1  |        Crash Id: 05e56c63-9bd1-47db-b3d5-c1f58cebd578
db-1  |     Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
db-1  |    Distribution: Ubuntu 22.04.3 LTS
db-1  |      Processors: 32
db-1  |    Total Memory: 67119079424 bytes
db-1  |       Timestamp: Mon Jan 15 18:19:00 2024
db-1  |      Last errno: 2
db-1  | Last errno text: No such file or directory
db-1  | Capturing a dump of 10
db-1  | Successfully captured dump: /var/opt/mssql/log/core.sqlservr.1_15_2024_18_19_0.10
db-1  | Executing: /opt/mssql/bin/handle-crash.sh with parameters
db-1  |      handle-crash.sh
db-1  |      /opt/mssql/bin/sqlservr
db-1  |      10
db-1  |      /opt/mssql/bin
db-1  |      /var/opt/mssql/log/
db-1  |      
db-1  |      83ef72ce-1100-44c4-913c-45d0df61ae44
db-1  |      05e56c63-9bd1-47db-b3d5-c1f58cebd578
db-1  |      
db-1  |      /var/opt/mssql/log/core.sqlservr.1_15_2024_18_19_0.10
db-1  | 
db-1  | Ubuntu 22.04.3 LTS
db-1  | Capturing core dump and information to /var/opt/mssql/log...

Docker-compose file:

version: '3'
services:
  db:
    image: 'mcr.microsoft.com/mssql/server:2022-latest'
    environment:
      - ACCEPT_EULA=Y
      - MSSQL_SA_PASSWORD=<there would be a password here>
      - MSSQL_PID=Developer
    volumes:
      - ./logs:/var/opt/mssql/log
      - ./data:/var/opt/mssql/data
    ports:
      - 1433:1433

Docker logs and data directory are set as UID:GID 10001:10001.

quinnjr avatar Jan 15 '24 18:01 quinnjr

I have the same issue. Found that it's the 6.7 kernel update. (https://github.com/microsoft/mssql-docker/issues/858#issuecomment-1892216070)

Rolling back to 6.6.10 makes it work again.

erikbozic avatar Jan 15 '24 18:01 erikbozic

I experienced the same behavior today. First my existing container grew in size very quickly. I tried creating other containers but they all failed with the above message.

It took me a while to figure out that downgrading my kernel fixes the issue, but downgrading to 6.6.11 did the trick.

thomasvm avatar Jan 15 '24 19:01 thomasvm

I can also confirm, I have the same behaviour. It works with Kernel 6.6 and with 6.7 I get a similiar Message as above.

unlogicalcode avatar Jan 16 '24 13:01 unlogicalcode

I downgraded my kernel and the container now functions.

Is this limited to just this container or docker needing to update something to be compatible with the 6.7 kernel?

quinnjr avatar Jan 16 '24 16:01 quinnjr

I have same problem running container in Podman, but the Docker container is running without any problem. I simply pulled the image sudo podman pull mcr.microsoft.com/mssql/server:2022-latest, and ran it:

sudo podman run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Str0ngPass!" -p 1433:1433 --name sql-test --hostname sql-test -d  mcr.microsoft.com/mssql/server:2022-latest

Attached is a log file. sql-test.log

huestack avatar Jan 17 '24 10:01 huestack

Can confirm on Arch Linux, both the docker images for versions 2017, 2019 and 2022 and the AUR version give the same result.

Last errno text: No such file or directory

After downgrading the kernel to version 6.6.10-arch1-1 it starts successfully.

LJFloor avatar Jan 17 '24 18:01 LJFloor

I can confirm this on Nobara 39 with 6.7.0 kernel. Exactly same issue for 2017, 2019, 2022 mssql. 6.6.9 works fine.

CodeKJ avatar Jan 18 '24 10:01 CodeKJ

It seems like this was solved in the aur repo package mssql-server: https://aur.archlinux.org/packages/mssql-server#comment-953063. However I'm still having trouble building the needed dependency to verify...

erikbozic avatar Jan 23 '24 07:01 erikbozic

For what it is worth:

running Gentoo with custom 6.7.x kernel. It looks like it fails trying to access cgroup v1 "/sys/fs/cgroup/memory/memory.limit_in_bytes". I suspect that switching to cgroup to "hybrid" would fix the issue but I am not up to rebooting my machine now.

$ docker run -it --rm -e ACCEPT_EULA=Y -e MSSQL_PID=Developer mcr.microsoft.com/mssql/server:2022-latest -- /bin/bash
sleep 1000

in another terminal, run

ps fax|less
# find pid of bash which is parent of sleep
sudo strace -o mssql.strace -f -s1000 -p <bash-in-mssql-docker>

return to the first terminal, Ctrl-C the sleep and run /opt/mssql/bin/sqlservr. Run /opt/mssql/bin/sqlservr and wait for it to crash. Go to the seconf terminal, interrupt strace.

$ grep -P '"/(proc|sys).*ENOENT' mssql.strace
9999 openat(AT_FDCWD, "/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = -1 ENOENT (No such file or directory)

kshpytsya avatar Jan 23 '24 14:01 kshpytsya

I think the ENOENT is not the issue, especially not /sys/fs/cgroup/memory/memory.limit_in_bytes since this doesn't exist on Kernel 6.6.13 either, and mssql runs fine there. My crashlogs on 6.7.1 showed Invalid argument / 22 / EINVAL:

This program has encountered a fatal error and cannot continue running at Mon Jan 22 18:09:17 2024
The following diagnostic information is available:

         Reason: 0x00000001
         Signal: SIGABRT - Aborted (6)
          Stack:
                 IP               Function
                 ---------------- --------------------------------------
                 0000613cdff2ace1 std::__1::bad_function_call::~bad_function_call()+0x96661
                 0000613cdff2a6a6 std::__1::bad_function_call::~bad_function_call()+0x96026
                 0000613cdff29c2f std::__1::bad_function_call::~bad_function_call()+0x955af
                 0000753f7ee4d520 __sigaction+0x50
                 0000753f7eea19fc pthread_kill+0x12c
                 0000753f7ee4d476 raise+0x16
                 0000753f7ee337f3 abort+0xd3
                 0000613cdfefbd96 std::__1::bad_function_call::~bad_function_call()+0x67716
        Process: 10 - sqlservr
         Thread: 161 (application thread 0x278)
    Instance Id: ba778b4b-ea20-4f3c-98fa-2002d4c8e68c
       Crash Id: 3674de73-5de7-494e-8530-2520421dd97f
    Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
   Distribution: Ubuntu 22.04.3 LTS
     Processors: 16
   Total Memory: 29180137472 bytes
      Timestamp: Mon Jan 22 18:09:17 2024
     Last errno: 22
Last errno text: Invalid argument

ibauersachs avatar Jan 23 '24 14:01 ibauersachs

The problem is still there with kernel 6.7.2

CryptoSiD avatar Jan 26 '24 03:01 CryptoSiD

same problem on 6.7.1-arch1-1

Green0wl avatar Jan 26 '24 17:01 Green0wl

As a bad side effect the lsof process it spawns starts eating a core

GieltjE avatar Jan 28 '24 19:01 GieltjE

Hello,

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

It is unrelated to cgroups, and at first glance it might be a kernel bug (but do not quote me on this) - it appears that as of 6.7, mmap without MAP_FIXED may sometimes ignore the address hint even if the hinted region is in fact available. I have not investigated the kernel side of things further, but I think it might be related to this series of changes and/or its preceding/following changes.

Knowing this, I cannot think of any workaround other than sticking to 6.6 in the meantime.

fbrosseau avatar Jan 30 '24 23:01 fbrosseau

Hello,

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

It is unrelated to cgroups, and at first glance it might be a kernel bug (but do not quote me on this) - it appears that as of 6.7, mmap without MAP_FIXED may sometimes ignore the address hint even if the hinted region is in fact available. I have not investigated the kernel side of things further, but I think it might be related to this series of changes and/or its preceding/following changes.

Knowing this, I cannot think of any workaround other than sticking to 6.6 in the meantime.

Thank you very much for the patch. Are there plans to also backport it to 2019?

vermarine avatar Feb 13 '24 12:02 vermarine

Just wanted to write to say I am so glad you have all written on here, I didn't even think about the fact I just upgraded my arch system, I was about to start tearing things apart this has saved me a heck of a lot of time, whilst I am here to say thank you, I can also confirm this is still happening on Arch Linux on 6.7.4

jaddie avatar Feb 14 '24 02:02 jaddie

Hi! We are running a msql based prosject on a mac and use the image mcr.microsoft.com/mssql/server:2019-latest through Podman. Podman will not start a container with this image since the kernel was updated. How kan we revert the kernel version of the host or is there another workaround? Any help would be highly appreciated. Thanks!

massouji82 avatar Feb 15 '24 10:02 massouji82

Same issue with Fedora 39 on 6.7.2 and 6.7.3, but fine on 6.6.x and 6.5.x (in case anyone is searching for this issue and using Fedora). Looking forward to the CU @fbrosseau

johnvanham avatar Feb 17 '24 12:02 johnvanham

I think MSFT should strongly consider backporting this at least to SQL Server 2019 if not even 2017 as well. As people continue to upgrade their kernels this is going to be happening on an ever larger scale to existing SQL Server linux / container installations.

zzzeek avatar Feb 17 '24 17:02 zzzeek

Thank you very much for the patch. Are there plans to also backport it to 2019?

Am I missing something? I do not see any updated Docker images for mcr.microsoft.com/mssql/server:2022-latest that would make it run on 6.7.*.

kshpytsya avatar Feb 19 '24 14:02 kshpytsya

Thank you very much for the patch. Are there plans to also backport it to 2019? Am I missing something? I do not see any updated Docker images for mcr.microsoft.com/mssql/server:2022-latest that would make it run on 6.7.*.

It should be included in the next CU, no date estimate

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

I've been keeping an eye on this page for a presumably CU12 to be released.

markbeazley avatar Feb 19 '24 14:02 markbeazley

Not working on 6.7.5 either.

asergios avatar Feb 19 '24 15:02 asergios

I am glad I ran into this page. This started happening recently on Fedora 39. Kernel 6.7.4. I will test another kernel and report back.

Edit: Works on 6.6.13.

brunofin avatar Feb 20 '24 11:02 brunofin

mysql, pgsql and sqlite all work no problem. but m$ seems to be able to afford not to give a crap about a regression in the latest kernel. not amused.

daef avatar Feb 21 '24 10:02 daef

─ docker logs 9536fdc556e1 ─╯ This program has encountered a fatal error and cannot continue running at Tue Feb 27 19:29:45 2024 The following diagnostic information is available:

     Reason: 0x00000001
     Signal: SIGABRT - Aborted (6)
      Stack:
             IP               Function
             ---------------- --------------------------------------
             000056dc072752fc <unknown>
             000056dc07274d42 <unknown>
             000056dc07274351 <unknown>
             00007c8fbb447090 killpg+0x40
             00007c8fbb44700b gsignal+0xcb
             00007c8fbb426859 abort+0x12b
             000056dc071fb3d2 <unknown>
             000056dc07287304 <unknown>
             000056dc072bc388 <unknown>
             000056dc072bc16a <unknown>
             000056dc0720724a <unknown>
             000056dc07206e9f <unknown>
    Process: 12 - sqlservr
     Thread: 83 (application thread 0x134)
Instance Id: 252d75bf-d3a4-4b38-a78f-b83488b53759
   Crash Id: 855b8579-9053-4856-ad38-69e4a54d6ff6
Build stamp: e149a9e980d9936d4f4a616b06112de0e7b2f4e45c2cd3a0884ae319ad3d13b7

Distribution: Ubuntu 20.04.6 LTS Processors: 12 Total Memory: 16618233856 bytes Timestamp: Tue Feb 27 19:29:45 2024 Last errno: 2 Last errno text: No such file or directory Capturing a dump of 12 Successfully captured dump: /var/opt/mssql/log/core.sqlservr.2_27_2024_19_29_45.12 Executing: /opt/mssql/bin/handle-crash.sh with parameters handle-crash.sh /opt/mssql/bin/sqlservr 12 /opt/mssql/bin /var/opt/mssql/log/

 252d75bf-d3a4-4b38-a78f-b83488b53759
 855b8579-9053-4856-ad38-69e4a54d6ff6
 
 /var/opt/mssql/log/core.sqlservr.2_27_2024_19_29_45.12

Ubuntu 20.04.6 LTS Capturing core dump and information to /var/opt/mssql/log... /bin/cat: /proc/12/maps: Permission denied SQL server is unavailable - sleeping

Run-c0de avatar Feb 27 '24 19:02 Run-c0de

Any plans to upgrade the Docker image to resolve this issue?

MPavleski avatar Feb 29 '24 14:02 MPavleski

Well, first there needs to be a new CU release, the last one is from january 2024 and there seems to be a pace of about 1 release per month, so a new release is about to be expected. But the team is not communicating release dates, so we can only wait at this point in time.

Keep track of this page to see whether a new CU is released.

thomasvm avatar Mar 01 '24 17:03 thomasvm

It's a bit infuriating that we need to wait for a critical bug fix to land on a monthly cumulative update without being even certain whether it actually will.

It would be much more productive instead to post here instructions on how to migrate the database to postgres and be done with it lol

brunofin avatar Mar 01 '24 20:03 brunofin

What the hell same issue here

YusufMavzer avatar Mar 05 '24 08:03 YusufMavzer

When can we hope for CU12 that will include the fix?

It's been 2 weeks already.

CryptoSiD avatar Mar 05 '24 08:03 CryptoSiD