agent-stack-k8s icon indicating copy to clipboard operation
agent-stack-k8s copied to clipboard

Fetching SSH creds for git extemely unreliable in a seemingly random way

Open c2h5oh opened this issue 1 year ago • 3 comments

Changes to podSpec result in unpredictable checkout failures, example:

steps:
  - label: ':pipeline: Pipeline Setup'
    agents:
      queue: my-ci
    plugins:
      - kubernetes:
          gitEnvFrom:
            - secretRef:
                name: buildkite-agent-ssh
          podSpec:
            containers:
              - image: 'buildkite/agent:latest'
                command:
                  - buildkite-agent
                args:
                  - pipeline upload

Works perfectly - 100% success rate

Adding an extra container results in 100% failure rate on checkout stage - missing creds - before any of the containers in spec are started. I have managed to trigger this by modifications as small as an additional space between args

steps:
  - label: ':pipeline: Pipeline Setup'
    agents:
      queue: my-ci
    plugins:
      - kubernetes:
          gitEnvFrom:
            - secretRef:
                name: buildkite-agent-ssh
          podSpec:
            containers:
              - image: 'buildkite/agent:latest'
                command:
                  - buildkite-agent
                args:
                  - pipeline upload
              - image: 'buildkite/agent:latest'
                command:
                  - /bin/bash
                args:
                  - echo $${BUILDKITE_BRANCH}

Adding

env:
    GIT_SSH_COMMAND: ssh -vvv

yields

[...]
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type 0
debug1: identity file /root/.ssh/id_rsa-cert type -1
[...]

for the first example

[...]
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
[...]

for the second one

With

env:
    GIT_SSH_COMMAND: ssh -i /workspace/.ssh -vvv 

second example reports that /workspace/.ssh doesn't exist

With

env:
    BUILDKITE_GIT_CLONE_FLAGS: "--depth=1"

That flag is passed to git command in first example, but not in the second.

c2h5oh avatar Dec 10 '23 01:12 c2h5oh

Hi @c2h5oh thanks for the detailed bug report.

I'm in the middle of overhauling how this works. I will try to make sure the new system works with the pipeline YAML you've provided. It would help immensely if you also send a link to the jobs to [email protected].

triarius avatar Dec 10 '23 06:12 triarius

@triarius done

c2h5oh avatar Dec 10 '23 14:12 c2h5oh

Apologies for the delay @c2h5oh. I've had a look, and it seems what writes the git credentials to the .ssh directory is a script in the buildkite-agnet docker container called ssh-env-config.sh. It's contained the default entrypoint for the container, but when you override that in the PodSpec, it's no longer run. This is far from well documented, so I'll work on adding something to README about it.

So if you want to run a bash command with the ssh credentials written to a file, you can do something like:

steps:
  - label: ':pipeline: Pipeline Setup'
    agents:
      queue: my-ci
    plugins:
      - kubernetes:
          gitEnvFrom:
            - secretRef:
                name: buildkite-agent-ssh
          podSpec:
            containers:
              - image: 'buildkite/agent:latest'
                command:
                  - buildkite-agent
                args:
                  - pipeline upload
              - image: 'buildkite/agent:latest'
                command:
                  - ssh-env-config.sh
                args:
                  - bash
                  - -c
                  - "'echo $${BUILDKITE_BRANCH}'"

Note that there are a few subtleties when writing bash commands as Kubernetes container args.

  1. You have to have separate arguments for bash, -c and the command you want to run.
  2. The command you want to run should be enclosed in a single quote. Because YAML strips a layer of quotes when parsing YAML strings, I've put a dummy layer of double quotes around the single quotes.
  3. You need to escape the $ sign, but you already had that in your example. After doing that, you can get something like this to work reliably. 2024-02-09-18-40-19

Note: private was my branch name.

triarius avatar Feb 09 '24 07:02 triarius