build
build copied to clipboard
Document inter-build caching strategies
This issue is intended to track documenting (and if necessary designing / implementing) facilities for inter-build caching.
Today Build supports intra-build caching through simple emptyDir volumes. e.g. if cache artifacts exist in /workspace or $HOME they will persist across steps for the duration of the Build. If users need to share additional volumes, they can configure their own volume[Mount]s: ... with emptyDir type relatively easily. However, no inter-build caching generally means that each build is "clean" (read: slow), which is nice for some environments, but less for others.
Luckily, we have the good fortune to be leveraging K8s abstractions, which means we can also access persistent volumes.
We are entering territory I have yet to experiment with, so take this with a grain of salt!
The general idea is that if a Build wants to leverage a persistent cache, it would mount it, e.g.
spec:
steps:
- image: super-builder:latest
volumeMounts:
- name: persistent-cache
mountPath: /var/super-builder/.cache
volumes:
- name: persistent-cache
# Fill in your favorite persistent volume.
persistentVolumeClaim:
claimName: mattmoor-cache
We can potentially use this in interesting ways that make caching optional, e.g.
=== BuildTemplate ===
spec:
parameters:
- name: CACHE
description: The name of the volume to mount for caching artifacts.
default: intra-build
steps:
- image: super-builder:latest
volumeMounts:
# Allow the user to override the volume we use as a cache.
- name: "${CACHE}"
mountPath: /var/super-builder/.cache
volumes:
# By default we provide intra-build caching via an emptyDir
- name: intra-build
emptyDir: {}
=== Build ===
spec:
template:
name: what-is-above
arguments:
- name: CACHE
value: persistent-cache
volumes:
- name: persistent-cache
# Fill in your favorite persistent volume.
persistentVolumeClaim:
claimName: mattmoor-cache
@bparees @sclevine @ImJasonH WDYT?
It is notable that when choosing a persistent volume option to consider that the time it takes to attach that storage to the node may be non-zero.
If a node had previously attached a PD for another pod which has since finished, is the PD still attached to the node for a future pod? Will k8s schedule another pod that wants that PD on that node? Or is the PD attached and detached each time to simplify scheduling?
On Fri, Feb 16, 2018 at 10:12 AM Matt Moore [email protected] wrote:
It is notable that when choosing a persistent volume option to consider that the time it takes to attach that storage to the node may be non-zero.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/google/build-crd/issues/52#issuecomment-366314625, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM3MeMb7B5AjSmJukr1nbZuPobHFy5rks5tVcUngaJpZM4SIjWG .
@mattmoor yeah i think it's useful. it's a concept we've wanted to add to openshift builds for a while, basically two things have kept us from doing it:
-
if you're running parallel builds, you need to be sure the PV you're mounting can be mounted read/write-many, and across multiple nodes simultaneously.
-
for us, since we actually do the build steps in a container that k8s is unaware of (since we talk to the docker socket) we don't have a good way to actually make the PV accessible to the container we launched. Obviously you don't have that problem.
@ImJasonH I have no sense for how smart the K8s scheduler is about persistent volumes.
@bparees Ack on the multi-write problem. I believe the "write-once" PVC is somewhat smart about this, and IIUC the Pod will sit as Pending until it can take the writer lock (definitely something to confirm!). It'd be fantastic if the scheduler were aware enough of the contention to colocate the pending Pod with the Pod running the mounted volume and elide the unmount/mount cost (at least). Another problem with multi-write is when build systems don't like to share.
For explicitly parallel builds (e.g. Matrix), having the concept of a PVC "pool" would be neat, where the write tenancy would be modeled as the pool size. I haven't bothered looking at whether this exists, since I can imagine very few workloads that might want that kind of abstraction. :)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale