mssql-docker icon indicating copy to clipboard operation
mssql-docker copied to clipboard

Healthcheck

Open mrfuxi opened this issue 7 years ago • 20 comments

Would it be possible to implement Docker's HEALTHCHECK command?

mrfuxi avatar Jul 25 '17 13:07 mrfuxi

Yeah, now that we have sqlcmd in the image this should be much easier. I'll put it on our backlog internally. If you need something sooner, I'd suggest creating your own image FROM ours and specifying a HEALTCHECK that uses sqlcmd with -q to execute a simple query periodically like 'SELECT 1'.

twright-msft avatar Jul 25 '17 17:07 twright-msft

I too am having an issue where the docker image crashes, but continues to output lines of logging, which prevents my internal hung-build-killer tool to detect that something went wrong.

2017-09-06 09:50:11.21 spid90 Timeout occurred while waiting for latch: class 'LOG_MANAGER', id 00000005ED740BB0, type 4, Task 0x00000005EB49B088 : 0, waittime 71400 seconds, flags 0x1a, owning task 0x00000005EB492CA8. Continuing to wait. 2017-09-06 09:50:25.26 spid89 Timeout occurred while waiting for latch: class 'LOG_MANAGER', id 00000005ED740BB0, type 4, Task 0x00000005EEDE7088 : 0, waittime 71400 seconds, flags 0x1a, owning task 0x00000005EB492CA8. Continuing to wait. 2017-09-06 09:51:25.25 spid84 Timeout occurred while waiting for latch: class 'LOG_MANAGER', id 00000005ED740BB0, type 4, Task 0x00000005E800DC28 : 0, waittime 71400 seconds, flags 0x1a, owning task 0x00000005EB492CA8. Continuing to wait.

having a healthcheck mechanism in the case of this would be great.

sangohan avatar Sep 07 '17 05:09 sangohan

I get the same crash / error after using netcat to ascertain when the process has started listening on port 1433

manicmonkey avatar Nov 15 '17 15:11 manicmonkey

This is super useful in order to avoid having to develop custom poll and wait script to check whether the DB is up in order to be able to start other activities.

axelfontaine avatar Jan 26 '18 14:01 axelfontaine

Dockerfile:

FROM microsoft/mssql-server-linux
HEALTHCHECK --interval=10s --timeout=3s --start-period=10s --retries=10 \
    CMD sqlcmd -S localhost -U sa -P ${SA_PASSWORD} -Q "SELECT 1" || exit 1

krkabol avatar Apr 19 '18 07:04 krkabol

It should be

CMD /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P ${SA_PASSWORD} -Q "SELECT 1" || exit 1

Because sqlcmd is not on PATH

djechelon avatar Dec 03 '19 13:12 djechelon

Is there a simple way to use HEALTHCHECK? I use this piece of code, but maybe you can advise something simpler? https://github.com/ObjectivityLtd/DBTestCompare/blob/master/restoreBackup.ps1#L7 do { $JSON=docker inspect --format='{{json .State.Health}}' sqlserver-container | Out-String | ConvertFrom-Json Write-Host HEALTHCHECK: $JSON.Status.ToString() sqlserver-container Start-Sleep -s 10 } Until ($JSON.Status.ToString() -eq 'healthy')

raczeja avatar Mar 18 '20 07:03 raczeja

version: "3.4"
services:
    db:
        image: "mcr.microsoft.com/mssql/server:2019-latest"
        environment:
            SA_PASSWORD: "Secure-Password.1234"
            ACCEPT_EULA: "Y"
        healthcheck:
            test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "$$SA_PASSWORD" -Q "SELECT 1" || exit 1
            interval: 10s
            timeout: 3s
            retries: 10
            start_period: 10s

tinohager avatar May 03 '20 18:05 tinohager

Everything that is written above must be changed: required with the option: -b and I recommend using the option: -o /dev/null in docker-compose.yaml

    healthcheck:
      test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "$${SA_PASSWORD}" -Q "SELECT 1" -b -o /dev/null
      interval: 10s
      timeout: 3s
      retries: 10
      start_period: 10s

in Dockerfile:

HEALTHCHECK --interval=10s --timeout=3s --start-period=10s --retries=10 \
  CMD /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $${SA_PASSWORD} -Q "SELECT 1" -b -o /dev/null

mrlioncub avatar Mar 11 '21 05:03 mrlioncub

I'm curious here, why do we need to have exit 1 if sqlcmd -Q already has a built-in exit?

 -Q "cmdline query" (and exit)  

https://docs.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-ver15#syntax

raschmitt avatar Mar 12 '21 22:03 raschmitt

Yes, exit 1 is not needed. Updated

mrlioncub avatar Mar 13 '21 15:03 mrlioncub

These solution don't take into account that during database migration after an update, the database can be locked for several minutes. To the developers: Is it possible to take that into account? Does sql-server have an idiomatic way to determine that it is currently upgrading databases? Are you maybe working on a HEALTHCHECK command that is working with that scenario?

The best idea so far to work around this is to add --start-period=10m to the healthcheck.

neur0manc avatar Jun 22 '21 13:06 neur0manc

Thanks @mrlioncub 🎉 I was searching for a solution for using the image in a GitHub Action and the healthcheck wasn't working...thankfully Google found your comment.

Passing -b -o /dev/null solved the issue for me. Sharing for future reference:

name: CI

on: push

jobs:
    test:
        name: Test
        runs-on: ubuntu-latest
        env:
            DB_USER: SA
            DB_PASSWORD: yourStrongP@ssword
            DB_NAME: your_db

        services:
            mssql:
                image: mcr.microsoft.com/mssql/server:2017-latest-ubuntu
                env:
                    ACCEPT_EULA: Y
                    SA_PASSWORD: ${{ env.DB_PASSWORD }}
                    MSSQL_PID: Express
                    DB_USER: ${{ env.DB_USER }}
                    DB_NAME: ${{ env.DB_NAME }}
                ports:
                    - 1433/tcp
                options: >-
                    --health-cmd "/opt/mssql-tools/bin/sqlcmd -U $DB_USER -P $SA_PASSWORD -Q 'select 1' -b -o /dev/null"
                    --health-interval 60s
                    --health-timeout 30s
                    --health-start-period 20s
                    --health-retries 3

        steps:
            - name: Create Database
              run: |
                  docker exec $(docker ps -alq) /opt/mssql-tools/bin/sqlcmd -U "$DB_USER" -P "$DB_PASSWORD" -Q "CREATE DATABASE $DB_NAME;"

            - uses: actions/checkout@v2

            - name: Setup Node Package Cache
              uses: actions/cache@v2
              with:
                  path: ~/.npm
                  key: ${{ runner.OS }}-node-${{ hashFiles('**/package-lock.json') }}

            - name: Install
              run: |
                  npm ci --prefer-offline --no-audit

            - name: Test
              env:
                  DB_HOST: localhost
                  DB_PORT: ${{ job.services.mssql.ports[1433] }}
                  DB_USER: ${{ env.DB_USER }}
                  DB_PASSWORD: ${{ env.DB_PASSWORD }}
                  DB_NAME: ${{ env.DB_NAME }}
              run: |
                  npm test

leomelzer avatar Jul 20 '21 12:07 leomelzer

yes ,I have the same problem,use these can't resolve. how to config sqlserver healthycheck?

YoChen avatar May 13 '22 12:05 YoChen

BEST PRACTICE


services:
  instance:
    container_name: sql-server-2022
    image: mcr.microsoft.com/mssql/server:2022-latest
    restart: unless-stopped
    environment:
      ACCEPT_EULA: Y
      MSSQL_SA_PASSWORD: SA-PASSWORD
      MSSQL_PID: Enterprise
    ports:
      - 2222:1433
    healthcheck:
      test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "SA-PASSWORD" -Q "SELECT 1" -b -o /dev/null
      interval: 10s
      timeout: 3s
      retries: 10
      start_period: 10s
    volumes:
      - mssql:/var/opt/mssql
    networks:
      - internal

ErcinDedeoglu avatar Mar 26 '23 11:03 ErcinDedeoglu

I disagree with the "best practice" of using -o /dev/null in the parameters. If you do that then the healthcheck output as seen in docker container inspect <container_id> is an empty string - which can hide clues about which part of the network/SQL stack is responsible for the failure.

If you do not include -o /dev/null in the parameters then you can see the actual error message as demonstrated in this snippet:

{
    "Health": {
        "Status": "unhealthy",
        "FailingStreak": 1,
        "Log": [
            {
                "Start": "2023-05-23T05:41:54.699717693Z",
                "End": "2023-05-23T05:41:54.827199774Z",
                "ExitCode": 1,
                "Output": "Msg 102, Level 15, State 1, Server 0999316e8300, Line 1\nIncorrect syntax near 'Error'.\n"
            }
        ]
    }
}

antman2 avatar May 23 '23 05:05 antman2

Hi, I'm curious why it's best practice to execute a command with a clear text password that can be read from anywhere on the system e.g. with ps -a. Is there really no other option to ask the mssql process if it's running and at good health? Something like pgrep -f /opt/mssql/bin/sqlservr && exit 1 || exit 0 would at least be more secure, but presumably not as reliable.

she-trifork avatar Jan 30 '24 15:01 she-trifork

Hi, I'm curious why it's best practice to execute a command with a clear text password that can be read from anywhere on the system e.g. with ps -a. Is there really no other option to ask the mssql process if it's running and at good health? Something like pgrep -f /opt/mssql/bin/sqlservr && exit 1 || exit 0 would at least be more secure, but presumably not as reliable.

You have two options here:

a) use SQLCMDPASSWORD environment variable to pass the password b) use the new go-sqlcmd tool, which allows you to define a config file, from where it can read the password.

fabiang avatar Jan 30 '24 17:01 fabiang

since /opt/mssql-tools/bin/sqlcmd is NOT available on linux/arm64 (Mac M1, M2, etc) for mcr.microsoft.com/azure-sql-edge:latest

Can someone please provide a command that will work with whatever flavour of linux mcr.microsoft.com/azure-sql-edge:latest is using? This way no additional tools need to be installed, if possible.

I realize it's a different image, but the ask is fairly generic: how does one test for port being open, and accepting connections w/o using specific tooling such as sqlcmd

clearwaterstream avatar Feb 28 '24 17:02 clearwaterstream

@clearwaterstream Taking inspiration from StackOverflow [1], I was able to use this since timeout is available in the image:

healthcheck:
  test: timeout 1 bash -c 'cat < /dev/null > /dev/tcp/127.0.0.1/1433'
  1. https://stackoverflow.com/a/19866239/84473

k3n avatar Mar 13 '24 20:03 k3n