Provide Persistent volume for caching
While migrating our operation from CircleCi to a self hosted Tekton pipeline we are struggling to optimize testing pipelines.
We use jest and one of the ways to speed things up is using persistent cache, can we provide suport for persistent storage? If not native there is a solution for this issue?
We want to be able to manage cache for:
- jest
- yarn
- webpack
@caiocampoos You can get a persistent cache setup using workspaces (https://tekton.dev/docs/pipelines/workspaces/) and a good enough Kubernetes Storage class for exampe.
@GijsvanDulmen Reading the docs my understandig was with workspaces where always wiped when a pipeline finishes.
@GijsvanDulmen you are correct. Ill try workspaces. Thanks.
Thanks, @caiocampoos and @GijsvanDulmen. Workspaces are indeed an option for this.
Tekton does not provide any inbuilt mechanism for caches specifically, it's something that the Tasks that create and consume the cache have to manage directly through reusable workspaces. We do provide support for optional workspaces so that you can write a Task that can benefit from a cache if it's available but also use it when the cache workspace is not available.
The newly introduced StepActions are a good option to define reusable steps that may produce and restore a cache for a specific tool - I think this would be great additions to the Tekton catalog.
I'd be curious to hear about your experience with this, please let us know if you feel that Tekton could/should do more in this direction.
@afrittoli thanks alot for the reply, i am having a hard time understanding the use of workspaces with persistentVolumeClaim at the moment.
Our usecase:
We have a pipeline of about 10k tests, so we need cache for dependencies and for Jest, witch is our test runner.
My current last try was:
apiVersion: triggers.tekton.dev/v1beta1
kind: TriggerTemplate
metadata:
name: tt-github-pr-trigger-template-
spec:
params:
- name: revision
- name: deploy
- name: repo-url
- name: author
- name: ref
- name: repo-full-name
- name: pr-ref
resourcetemplates:
- apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: pr-$(tt.params.pr-ref)-$(tt.params.author)-
spec:
serviceAccountName: service-account-
pipelineRef:
name: pipeline
podTemplate:
securityContext:
fsGroup: 65532
workspaces:
- name: shared-data
persistentVolumeClaim:
claimName: pvc-cache
params:
- name: repo-url
value: $(tt.params.repo-url)
- name: revision
value: $(tt.params.revision)
- name: repo-full-name
value: $(tt.params.repo-full-name)
- name: ref
value: $(tt.params.ref)
- name: deploy
value: $(tt.params.deploy)
pipeline: (i omit a good chunk just for sake of simplicity')
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: my-pipeline
spec:
workspaces:
- name: shared-data
params:
- name: repo-url
type: string
- name: revision
type: string
- name: repo-full-name
type: string
- name: ref
type: string
- name: deploy
type: string
tasks:
- name: fetch-source
taskRef:
resolver: cluster
params:
- name: kind
value: task
- name: name
value: task-git-clone
- name: namespace
value: tekton-pipelines
params:
- name: url
value: $(params.repo-url)
- name: revision
value: $(params.revision)
- name: depth
value: 2
workspaces:
- name: output
workspace: shared-data
- name: install-deps
runAfter: ["update-status-running"]
taskRef:
resolver: cluster
params:
- name: kind
value: task
- name: name
value: task-install-deps
- name: namespace
value: tekton-pipelines
params:
- name: install-script
value: {{ .Values.install_script }}
- name: post-install-script
value: {{ default "echo no script" .Values.post_install_script }}
workspaces:
- name: source
workspace: shared-data
Task: git-clone from docs
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: git-cli
labels:
app.kubernetes.io/version: "0.4"
annotations:
tekton.dev/pipelines.minVersion: "0.21.0"
tekton.dev/categories: Git
tekton.dev/tags: git
tekton.dev/displayName: "git cli"
tekton.dev/platforms: "linux/amd64,linux/s390x,linux/ppc64le"
spec:
description: >-
This task can be used to perform git operations.
Git command that needs to be run can be passed as a script to
the task. This task needs authentication to git in order to push
after the git operation.
workspaces:
- name: source
description: A workspace that contains the fetched git repository.
- name: input
optional: true
description: |
An optional workspace that contains the files that need to be added to git. You can
access the workspace from your script using `$(workspaces.input.path)`, for instance:
cp $(workspaces.input.path)/file_that_i_want .
git add file_that_i_want
# etc
- name: ssh-directory
optional: true
description: |
A .ssh directory with private key, known_hosts, config, etc. Copied to
the user's home before git commands are executed. Used to authenticate
with the git remote when performing the clone. Binding a Secret to this
Workspace is strongly recommended over other volume types.
- name: basic-auth
optional: true
description: |
A Workspace containing a .gitconfig and .git-credentials file. These
will be copied to the user's home before any git commands are run. Any
other files in this Workspace are ignored. It is strongly recommended
to use ssh-directory over basic-auth whenever possible and to bind a
Secret to this Workspace over other volume types.
params:
- name: BASE_IMAGE
description: |
The base image for the task.
type: string
# TODO: Deprecate use of root image.
default: cgr.dev/chainguard/git:root-2.39@sha256:7759f87050dd8bacabe61354d75ccd7f864d6b6f8ec42697db7159eccd491139
- name: GIT_USER_NAME
type: string
description: |
Git user name for performing git operation.
default: ""
- name: GIT_USER_EMAIL
type: string
description: |
Git user email for performing git operation.
default: ""
- name: GIT_SCRIPT
description: The git script to run.
type: string
default: |
git help
- name: USER_HOME
description: |
Absolute path to the user's home directory. Set this explicitly if you are running the image as a non-root user or have overridden
the gitInitImage param with an image containing custom user configuration.
type: string
default: "/root"
- name: VERBOSE
description: Log the commands that are executed during `git-clone`'s operation.
type: string
default: "true"
results:
- name: commit
description: The precise commit SHA after the git operation.
steps:
- name: git
image: $(params.BASE_IMAGE)
workingDir: $(workspaces.source.path)
env:
- name: HOME
value: $(params.USER_HOME)
- name: PARAM_VERBOSE
value: $(params.VERBOSE)
- name: PARAM_USER_HOME
value: $(params.USER_HOME)
- name: WORKSPACE_OUTPUT_PATH
value: $(workspaces.output.path)
- name: WORKSPACE_SSH_DIRECTORY_BOUND
value: $(workspaces.ssh-directory.bound)
- name: WORKSPACE_SSH_DIRECTORY_PATH
value: $(workspaces.ssh-directory.path)
- name: WORKSPACE_BASIC_AUTH_DIRECTORY_BOUND
value: $(workspaces.basic-auth.bound)
- name: WORKSPACE_BASIC_AUTH_DIRECTORY_PATH
value: $(workspaces.basic-auth.path)
script: |
#!/usr/bin/env sh
set -eu
if [ "${PARAM_VERBOSE}" = "true" ] ; then
set -x
fi
if [ "${WORKSPACE_BASIC_AUTH_DIRECTORY_BOUND}" = "true" ] ; then
cp "${WORKSPACE_BASIC_AUTH_DIRECTORY_PATH}/.git-credentials" "${PARAM_USER_HOME}/.git-credentials"
cp "${WORKSPACE_BASIC_AUTH_DIRECTORY_PATH}/.gitconfig" "${PARAM_USER_HOME}/.gitconfig"
chmod 400 "${PARAM_USER_HOME}/.git-credentials"
chmod 400 "${PARAM_USER_HOME}/.gitconfig"
fi
if [ "${WORKSPACE_SSH_DIRECTORY_BOUND}" = "true" ] ; then
cp -R "${WORKSPACE_SSH_DIRECTORY_PATH}" "${PARAM_USER_HOME}"/.ssh
chmod 700 "${PARAM_USER_HOME}"/.ssh
chmod -R 400 "${PARAM_USER_HOME}"/.ssh/*
fi
# Setting up the config for the git.
git config --global user.email "$(params.GIT_USER_EMAIL)"
git config --global user.name "$(params.GIT_USER_NAME)"
eval '$(params.GIT_SCRIPT)'
RESULT_SHA="$(git rev-parse HEAD | tr -d '\n')"
EXIT_CODE="$?"
if [ "$EXIT_CODE" != 0 ]
then
exit $EXIT_CODE
fi
# Make sure we don't add a trailing newline to the result!
printf "%s" "$RESULT_SHA" > "$(results.commit.path)"
I manage to use git clone task and aws cli using default secrets just fine when i mount my workspace as a:
workspaces:
- name: shared-data
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
But when i try to setup a pv and pvc to pass it as persistentVolumeClaim, i cannot manage to access those credentials anymore.
We are migrating our pipeline from CircleCi and i am very new to Tekton, so i would really appreciate a help here. Just from docs and examples i couldn't manage to figure workspaces for persistent cache out very well.
I wanna add, that in previous tests i was able to persist data across pipeline runs and share between tasks, i am just confuse about how to setup credentials in the case of git or other tasks that use it.
@caiocampoos note that you can have multiple workspaces, one with volumeClaimTemplate (and thus getting deleted when the PipelineRun is deleted) for the sources, and another one (backed by a pvc and peristentVolumeClaim) for the cache.
@vdemeester i follow this path, adding a cache with persistentVolumeClaim, but the issue is, i get this error when the task that uses both will run:
Witch disappears if i disable coschedule, but, i get another problem of multiple TasrkRuns in pending state on my node.
For something like this:
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: pr-$(tt.params.pr-ref)-$(tt.params.author)-
spec:
serviceAccountName: service-account-{{ .Values.projectName }}
pipelineRef:
name: {{ .Values.projectName}}-pipeline
workspaces:
- name: cache
persistentVolumeClaim:
claimName: pvc-cache-{{ .Values.projectName }}
- name: shared-data
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
I can make it work disabling affinity assistant, but running multiple paralell taks floods my node with pods, making everything slower than before. I can't see how to optimize this for a single node.