testcontainers-dotnet icon indicating copy to clipboard operation
testcontainers-dotnet copied to clipboard

[Bug]: MsSql health check does not complete on newest container image

Open CCThorstenSauter opened this issue 1 year ago • 17 comments

Testcontainers version

3.9.0

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host arch

x64

.NET version

8.0.303

Docker version

Client:
 Version:           25.0.5
 API version:       1.44
 Go version:        go1.21.10
 Git commit:        d260a54c81efcc3f00fe67dee78c94b16c2f8692
 Built:             Sun May 12 07:25:43 2024
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          25.0.5
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.10
  Git commit:       e63daec8672d77ac0b2b5c262ef525c7cf17fd20
  Built:            Sun May 12 07:25:43 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.10
  GitCommit:        4e1fe7492b9df85914c389d1f15a3ceedbb280ac
 runc:
  Version:          1.1.12
  GitCommit:        51d5e94601ceffbbd85688df1c928ecccbfa4685
 docker-init:
  Version:          0.19.0
  GitCommit:

Docker info

Client:
 Version:    25.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx

Server:
 Containers: 29
  Running: 5
  Paused: 0
  Stopped: 24
 Images: 15
 Server Version: 25.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 4e1fe7492b9df85914c389d1f15a3ceedbb280ac
 runc version: 51d5e94601ceffbbd85688df1c928ecccbfa4685
 init version:
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.15.153.1-microsoft-standard-WSL2
 Operating System: Rancher Desktop WSL Distribution
 OSType: linux
 Architecture: x86_64
 CPUs: 24
 Total Memory: 15.58GiB
 Name: CCD-0024
 ID: 398be532-db59-47b3-bcf7-d989f4f09517
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

What happened?

When using the MsSql package with the newest container image mcr.microsoft.com/mssql/server:2022-latest with a digest of sha256:c1aa8afe9b06eab64c9774a4802dcd032205d1be785b1fd51e1c0151e7586b74, the health check specified in the waiting strategy never completes, even though the logs of the SQL server container show it being ready, leading to a timeout.

This behavior is not present when using a slightly older container image version, e.g. mcr.microsoft.com/mssql/server:2022-CU13-ubuntu-22.04 with a digest of sha256:c4369c38385eba011c10906dc8892425831275bb035d5ce69656da8e29de50d8.

Relevant log output

[testcontainers.org 00:00:00.38] Searching Docker registry credential in CredHelpers
[testcontainers.org 00:00:00.38] Searching Docker registry credential in CredsStore
[testcontainers.org 00:00:00.38] Searching Docker registry credential in Auths
[testcontainers.org 00:00:00.38] Docker registry credential https://index.docker.io/v1/ found
[testcontainers.org 00:00:01.50] Docker image testcontainers/ryuk:0.6.0 created
[testcontainers.org 00:00:01.58] Docker container 8d1b2fa17535 created
[testcontainers.org 00:00:01.64] Start Docker container 8d1b2fa17535
[testcontainers.org 00:00:01.96] Wait for Docker container 8d1b2fa17535 to complete readiness checks
[testcontainers.org 00:00:01.96] Docker container 8d1b2fa17535 ready
[testcontainers.org 00:00:01.97] Searching Docker registry credential in Auths
[testcontainers.org 00:00:01.97] Searching Docker registry credential in Auths
[testcontainers.org 00:00:01.97] Searching Docker registry credential in CredHelpers
[testcontainers.org 00:00:01.97] Searching Docker registry credential in CredsStore
[testcontainers.org 00:00:01.97] Docker registry credential mcr.microsoft.com not found
[testcontainers.org 00:00:18.94] Docker image mcr.microsoft.com/mssql/server:2022-latest created
[testcontainers.org 00:00:18.96] Docker container 4a3b482d21c9 created
[testcontainers.org 00:00:18.97] Start Docker container 4a3b482d21c9
[testcontainers.org 00:00:19.20] Wait for Docker container 4a3b482d21c9 to complete readiness checks
[testcontainers.org 00:00:19.20] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:20.27] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:21.32] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:22.42] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:23.58] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
[testcontainers.org 00:00:24.69] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9
...
[testcontainers.org 00:03:37.79] Execute "/opt/mssql-tools/bin/sqlcmd -Q SELECT 1;" at Docker container 4a3b482d21c9

Additional information

No response

CCThorstenSauter avatar Jul 24 '24 10:07 CCThorstenSauter

We see the same problem at the moment.

butzei avatar Jul 24 '24 10:07 butzei

This is also affecting us when it's running inside our GitHub Actions for CI/CD. It's currently preventing us from doing any releases.

intrepid-developer avatar Jul 24 '24 10:07 intrepid-developer

I confirm that our tests using TestContainers and MsSQL stopped passing today 🤕

szl-spyro avatar Jul 24 '24 11:07 szl-spyro

When looking at the image it seems that path for sqlcmd has changed from /opt/mssql-tools/bin/sqlcmd to /opt/mssql-tools18/bin/sqlcmd. Not sure if this was intentional or not.

pascalberger avatar Jul 24 '24 11:07 pascalberger

FYI someone has reported it on MSSQL-Docker: https://github.com/microsoft/mssql-docker/issues/892

intrepid-developer avatar Jul 24 '24 11:07 intrepid-developer

As mentioned in Slack, we likely need to adapt the default wait strategy (see https://github.com/testcontainers/testcontainers-dotnet/blob/develop/src/Testcontainers.MsSql/MsSqlBuilder.cs#L132-L145).

Users can provide their own wait strategy configuration as a workaround.

kiview avatar Jul 24 '24 11:07 kiview

This started blocking our Azure DevOps pipeline yesterday.

jonathaneckman avatar Jul 24 '24 12:07 jonathaneckman

after @pascalberger comment en combined with @kiview i first ran into certificate issues:

Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : SSL Provider: [error:0A000086:SSL routines::certificate verify failed:self-signed certificate]. Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : Client unable to establish connection. For solutions related to encryption errors, see https://go.microsoft.com/fwlink/?linkid=2226722.

but got it working for now by also adding the -C option:

.WithWaitStrategy( Wait.ForUnixContainer() .UntilCommandIsCompleted("/opt/mssql-tools18/bin/sqlcmd", "-C", "-Q", "SELECT 1;") )

Fireblade954 avatar Jul 24 '24 12:07 Fireblade954

This works when run locally, but still times out when run in an Azure DevOps pipeline:

new MsSqlBuilder()
        .WithImage("mcr.microsoft.com/mssql/server:2022-latest")
        .WithEnvironment("ACCEPT_EULA", "Y")
        .WithPortBinding(11143, 1433)
        .WithWaitStrategy(
            Wait.ForUnixContainer()
                .UntilCommandIsCompleted(
                    "/opt/mssql-tools/bin/sqlcmd",
                    "-C",
                    "-Q",
                    "SELECT 1;"
                )
        )
        .Build();

This times out in both:

new MsSqlBuilder()
        .WithImage("mcr.microsoft.com/mssql/server:2022-latest")
        .WithEnvironment("ACCEPT_EULA", "Y")
        .WithPortBinding(11143, 1433)
        .WithWaitStrategy(
            Wait.ForUnixContainer()
                .UntilCommandIsCompleted(
                    "/opt/mssql-tools18/bin/sqlcmd",
                    "-C",
                    "-Q",
                    "SELECT 1;"
                )
        )
        .Build();

jonathaneckman avatar Jul 24 '24 16:07 jonathaneckman

I have been able to replicate this locally by deleting the cached 2022-latest container image. After it downloads the latest image, it hangs indefinitely.

Adding .WithWaitStrategy( Wait.ForUnixContainer() .UntilCommandIsCompleted("/opt/mssql-tools18/bin/sqlcmd", "-C", "-Q", "SELECT 1;") ) resolved the issue. Thanks @Fireblade954!

tscrip avatar Jul 24 '24 16:07 tscrip

@tscrip that did it. We missed a test project so had a false negative. Thanks!

jonathaneckman avatar Jul 24 '24 17:07 jonathaneckman

This is also affecting .NET Aspire - https://github.com/dotnet/aspire/issues/5057

eerhardt avatar Jul 24 '24 21:07 eerhardt

After reading all these comments, I would like to point out that we recommend pinning the image version. Using the latest tag does not automatically update the cached image on your development machine; it will use the version it pulled weeks ago. Meanwhile, the ephemeral CI pipeline pulls the actual latest version because it is not cached (this may lead to different behaviors on developer machines and in the CI pipeline).

Since it looks like the new path will remain (https://github.com/microsoft/mssql-docker/issues/892#issuecomment-2249029917), we can update the default wait strategy for the new version. Overriding the wait strategy, as @Fireblade954 suggested, or pinning the version are workarounds to avoid this issue.

We can probably do something similar to what we are doing in the MongoDB module to determine which binary (path) is available.

HofmeisterAn avatar Jul 25 '24 05:07 HofmeisterAn

BTW this will also break the ExecScriptAsync method as it also uses sqlcmd. (additionally they are now defaulting to encryption required, which means you need to pass -C with the sqlcmd to tell it to trust the server cert).

jwyza-pi avatar Jul 30 '24 16:07 jwyza-pi

This works for us:

.WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(1433))

randsu avatar Aug 05 '24 14:08 randsu

This works for us:

.WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(1433))

This won't always work as the container might be ready but MSSQL might not be ready to receive requests.

Xor-el avatar Aug 05 '24 14:08 Xor-el

This works for us: .WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(1433))

This won't always work as the container might be ready but MSSQL might not be ready to receive requests.

Thats true, although it fails very rarely, atleast for us, and it will usually work, regardless of old or new image from microsoft.

I have now rewritten to use .WithWaitStrategy( Wait.ForUnixContainer() .UntilCommandIsCompleted("/opt/mssql-tools18/bin/sqlcmd", "-C", "-Q", "SELECT 1;") ) and downloaded the new image locally.

randsu avatar Aug 06 '24 10:08 randsu