docker-images icon indicating copy to clipboard operation
docker-images copied to clipboard

SQL container fails boot when run in Kubernetes

Open NandGates opened this issue 4 years ago • 5 comments

Mapping the provided docker-compose files to a Kubernetes manifest leads to the SQL container failing to start.

I am using the completely unmodified Docker containers generated using Build.ps1 and the only change of note has been mapping the environment variables into Kubernetes format. No other code or architectural changes have been made.

I have been triaging this for some time now and have isolated it to the use of Invoke-SqlCmd which I believe has some strange timing issue when a Deployment initiates the boot of the container. Despite there being no password applied, the container returns

Login failed for NT AUTHORITY/ANONYMOUS LOGON

sqlcmd works flawlessly at all times.

Could we please consider removing Invoke-SqlCmd from Boot.ps1 both

  • windows\9.3.x\sitecore-xp-sqldev
  • \windows\9.x.x\sitecore-xm-sqldev

and instead replace with sqlcmd -Q which is much more robust? We are already using this in the base mssql layer at windows\dependencies\mssql-developer-2017\Start.ps1 and it would be good to standardise.

I am literally running the below snippet in my Kubernetes manifest to replace Invoke-SqlCmd with sqlcmd which is working as expected.

command: ['powershell'] args: ['-c', '(Get-Content ./Boot.ps1) -replace "Invoke-SqlCmd -Query", "sqlcmd -Q" | Out-File Boot.ps1; C:/Boot.ps1 -InstallPath $env:INSTALL_PATH -DataPath $env:DATA_PATH'' ]

I am happy to supply a PR if you believe this change is useful and likely to be approved.

NandGates avatar Apr 27 '20 04:04 NandGates

Can you share K8s manifest templates?

bplasmeijer avatar Apr 27 '20 11:04 bplasmeijer

Hi @bplasmeijer,

Thanks for the response, it is much appreciated. I'm aware you are busy!

I've included the Kubernetes deployment manifest below. For completeness the environment details are

  • Running in an Azure Kubernetes Service
  • Running on a Windows node
  • There is no magic other than pointing the manifest at an ACR that holds the built Sitecore image

I'm starting to believe this is a timing issue to do with the CMD being executed too early, I am considering implementing a health probe. If this addresses the issue I will post here.

kind: Deployment
metadata:
  name: sitecore-sql-xm
  namespace: sitecore
  labels:
    app: sitecore
spec:
  selector:
    matchLabels:
      app: sitecore
      role: xm-sql
  template:
    metadata:
      labels:
        app: sitecore
        role: xm-sql
    spec:
      containers:
        - name: xm-sql
          image: [REGISTRY]/sitecore-xm-sqldev
          imagePullPolicy: Always
          env:
          - name: SA_PASSWORD
            value: "8Tombs-Given-Clock#-arming-Alva-debut-Spine-monica-Normal-Ted-About1-chard-Easily-granddad-5Context!"
          - name: ACCEPT_EULA
            value: "Y"
      nodeSelector:
        agentpool: win

NandGates avatar Apr 27 '20 23:04 NandGates

I can confirm this is due to a timing issue with container readiness. Unfortunately AKS currently does not support startupProbe declarations (https://github.com/Azure/AKS/issues/1550) but a readinessProbe as below has solved the issue.

Closing but also posting the solution below for any future readers.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sitecore-sql-xp
  namespace: sitecore
  labels:
    app: sitecore
spec:
  selector:
    matchLabels:
      app: sitecore
      role: xp-sql
  template:
    metadata:
      labels:
        app: sitecore
        role: xp-sql
    spec:
      containers:
        - name: xp-sql
          image: [REPOSITORY]/sitecore-xp-sqldev
          imagePullPolicy: Always
          env:
          - name: SA_PASSWORD
            value: "8Tombs-Given-Clock#-arming-Alva-debut-Spine-monica-Normal-Ted-About1-chard-Easily-granddad-5Context!"
          - name: ACCEPT_EULA
            value: "Y"
          readinessProbe:
            tcpSocket:
              port: 1433
            failureThreshold: 30
            periodSeconds: 10
      nodeSelector:
        agentpool: win

NandGates avatar Apr 28 '20 01:04 NandGates

Unfortunately this issue persists even with the readinessProbe.

Best as I can tell in Kubernetes the Entrypoint is being invoked too early, and somehow Invoke-SqlCmd is erroring and thus caching invalid credentials. sqlcmd directly does not have this problem.

Interestingly the error returned from Invoke-SqlCmd is

Login failed for NT AUTHORITY/ANONYMOUS LOGON

But when I run whoami inside the container I get (as expected)

usermanager/container administrator

When I get the SQL identity (using sqlcmd -Q "SUSER_NAME()")

then as expected I get

usermanager/container administrator

I honestly have no explanation for this, so I'm reverting to my original request which is for the standardisation of sqlcmd rather than Invoke-SqlCmd as this is what is used by base images in the Docker process. As mentioned I am happy to submit a PR if it is likely to be useful and approved.

As a final note, Microsoft seem to be standardising on the use of sqlcmd in their tooling and scripts on MSDN also, so this would align to vendor practice.

NandGates avatar May 05 '20 23:05 NandGates

May I suggest that you use the Linux images for SQL and Solr instead? I know for sure that they are working in Kubernetes and they are also faster and uses less resources.

pbering avatar May 12 '20 17:05 pbering