agent-stack-k8s
agent-stack-k8s copied to clipboard
Fetching SSH creds for git extemely unreliable in a seemingly random way
Changes to podSpec result in unpredictable checkout failures, example:
steps:
- label: ':pipeline: Pipeline Setup'
agents:
queue: my-ci
plugins:
- kubernetes:
gitEnvFrom:
- secretRef:
name: buildkite-agent-ssh
podSpec:
containers:
- image: 'buildkite/agent:latest'
command:
- buildkite-agent
args:
- pipeline upload
Works perfectly - 100% success rate
Adding an extra container results in 100% failure rate on checkout stage - missing creds - before any of the containers in spec are started. I have managed to trigger this by modifications as small as an additional space between args
steps:
- label: ':pipeline: Pipeline Setup'
agents:
queue: my-ci
plugins:
- kubernetes:
gitEnvFrom:
- secretRef:
name: buildkite-agent-ssh
podSpec:
containers:
- image: 'buildkite/agent:latest'
command:
- buildkite-agent
args:
- pipeline upload
- image: 'buildkite/agent:latest'
command:
- /bin/bash
args:
- echo $${BUILDKITE_BRANCH}
Adding
env:
GIT_SSH_COMMAND: ssh -vvv
yields
[...]
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type 0
debug1: identity file /root/.ssh/id_rsa-cert type -1
[...]
for the first example
[...]
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
[...]
for the second one
With
env:
GIT_SSH_COMMAND: ssh -i /workspace/.ssh -vvv
second example reports that /workspace/.ssh
doesn't exist
With
env:
BUILDKITE_GIT_CLONE_FLAGS: "--depth=1"
That flag is passed to git command in first example, but not in the second.
Hi @c2h5oh thanks for the detailed bug report.
I'm in the middle of overhauling how this works. I will try to make sure the new system works with the pipeline YAML you've provided. It would help immensely if you also send a link to the jobs to [email protected].
@triarius done
Apologies for the delay @c2h5oh. I've had a look, and it seems what writes the git credentials to the .ssh
directory is a script in the buildkite-agnet
docker container called ssh-env-config.sh
. It's contained the default entrypoint for the container, but when you override that in the PodSpec, it's no longer run. This is far from well documented, so I'll work on adding something to README about it.
So if you want to run a bash command with the ssh credentials written to a file, you can do something like:
steps:
- label: ':pipeline: Pipeline Setup'
agents:
queue: my-ci
plugins:
- kubernetes:
gitEnvFrom:
- secretRef:
name: buildkite-agent-ssh
podSpec:
containers:
- image: 'buildkite/agent:latest'
command:
- buildkite-agent
args:
- pipeline upload
- image: 'buildkite/agent:latest'
command:
- ssh-env-config.sh
args:
- bash
- -c
- "'echo $${BUILDKITE_BRANCH}'"
Note that there are a few subtleties when writing bash commands as Kubernetes container args
.
- You have to have separate arguments for
bash
,-c
and the command you want to run. - The command you want to run should be enclosed in a single quote. Because YAML strips a layer of quotes when parsing YAML strings, I've put a dummy layer of double quotes around the single quotes.
- You need to escape the
$
sign, but you already had that in your example. After doing that, you can get something like this to work reliably.
Note: private
was my branch name.