docker-images
docker-images copied to clipboard
SQL container fails boot when run in Kubernetes
Mapping the provided docker-compose files to a Kubernetes manifest leads to the SQL container failing to start.
I am using the completely unmodified Docker containers generated using Build.ps1
and the only change of note has been mapping the environment variables into Kubernetes format. No other code or architectural changes have been made.
I have been triaging this for some time now and have isolated it to the use of Invoke-SqlCmd
which I believe has some strange timing issue when a Deployment initiates the boot of the container. Despite there being no password applied, the container returns
Login failed for NT AUTHORITY/ANONYMOUS LOGON
sqlcmd
works flawlessly at all times.
Could we please consider removing Invoke-SqlCmd
from Boot.ps1
both
- windows\9.3.x\sitecore-xp-sqldev
- \windows\9.x.x\sitecore-xm-sqldev
and instead replace with sqlcmd -Q
which is much more robust? We are already using this in the base mssql layer at windows\dependencies\mssql-developer-2017\Start.ps1
and it would be good to standardise.
I am literally running the below snippet in my Kubernetes manifest to replace Invoke-SqlCmd
with sqlcmd
which is working as expected.
command: ['powershell']
args: ['-c', '(Get-Content ./Boot.ps1) -replace "Invoke-SqlCmd -Query", "sqlcmd -Q" | Out-File Boot.ps1; C:/Boot.ps1 -InstallPath $env:INSTALL_PATH -DataPath $env:DATA_PATH'' ]
I am happy to supply a PR if you believe this change is useful and likely to be approved.
Can you share K8s manifest templates?
Hi @bplasmeijer,
Thanks for the response, it is much appreciated. I'm aware you are busy!
I've included the Kubernetes deployment manifest below. For completeness the environment details are
- Running in an Azure Kubernetes Service
- Running on a Windows node
- There is no magic other than pointing the manifest at an ACR that holds the built Sitecore image
I'm starting to believe this is a timing issue to do with the CMD being executed too early, I am considering implementing a health probe. If this addresses the issue I will post here.
kind: Deployment
metadata:
name: sitecore-sql-xm
namespace: sitecore
labels:
app: sitecore
spec:
selector:
matchLabels:
app: sitecore
role: xm-sql
template:
metadata:
labels:
app: sitecore
role: xm-sql
spec:
containers:
- name: xm-sql
image: [REGISTRY]/sitecore-xm-sqldev
imagePullPolicy: Always
env:
- name: SA_PASSWORD
value: "8Tombs-Given-Clock#-arming-Alva-debut-Spine-monica-Normal-Ted-About1-chard-Easily-granddad-5Context!"
- name: ACCEPT_EULA
value: "Y"
nodeSelector:
agentpool: win
I can confirm this is due to a timing issue with container readiness.
Unfortunately AKS currently does not support startupProbe
declarations (https://github.com/Azure/AKS/issues/1550) but a readinessProbe
as below has solved the issue.
Closing but also posting the solution below for any future readers.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sitecore-sql-xp
namespace: sitecore
labels:
app: sitecore
spec:
selector:
matchLabels:
app: sitecore
role: xp-sql
template:
metadata:
labels:
app: sitecore
role: xp-sql
spec:
containers:
- name: xp-sql
image: [REPOSITORY]/sitecore-xp-sqldev
imagePullPolicy: Always
env:
- name: SA_PASSWORD
value: "8Tombs-Given-Clock#-arming-Alva-debut-Spine-monica-Normal-Ted-About1-chard-Easily-granddad-5Context!"
- name: ACCEPT_EULA
value: "Y"
readinessProbe:
tcpSocket:
port: 1433
failureThreshold: 30
periodSeconds: 10
nodeSelector:
agentpool: win
Unfortunately this issue persists even with the readinessProbe.
Best as I can tell in Kubernetes the Entrypoint is being invoked too early, and somehow Invoke-SqlCmd
is erroring and thus caching invalid credentials. sqlcmd
directly does not have this problem.
Interestingly the error returned from Invoke-SqlCmd
is
Login failed for NT AUTHORITY/ANONYMOUS LOGON
But when I run whoami
inside the container I get (as expected)
usermanager/container administrator
When I get the SQL identity (using sqlcmd -Q "SUSER_NAME()"
)
then as expected I get
usermanager/container administrator
I honestly have no explanation for this, so I'm reverting to my original request which is for the standardisation of sqlcmd
rather than Invoke-SqlCmd
as this is what is used by base images in the Docker process. As mentioned I am happy to submit a PR if it is likely to be useful and approved.
As a final note, Microsoft seem to be standardising on the use of sqlcmd
in their tooling and scripts on MSDN also, so this would align to vendor practice.
May I suggest that you use the Linux images for SQL and Solr instead? I know for sure that they are working in Kubernetes and they are also faster and uses less resources.