build icon indicating copy to clipboard operation
build copied to clipboard

Document inter-build caching strategies

Open mattmoor opened this issue 7 years ago • 6 comments

This issue is intended to track documenting (and if necessary designing / implementing) facilities for inter-build caching.

mattmoor avatar Feb 16 '18 16:02 mattmoor

Today Build supports intra-build caching through simple emptyDir volumes. e.g. if cache artifacts exist in /workspace or $HOME they will persist across steps for the duration of the Build. If users need to share additional volumes, they can configure their own volume[Mount]s: ... with emptyDir type relatively easily. However, no inter-build caching generally means that each build is "clean" (read: slow), which is nice for some environments, but less for others.

Luckily, we have the good fortune to be leveraging K8s abstractions, which means we can also access persistent volumes.

We are entering territory I have yet to experiment with, so take this with a grain of salt!

The general idea is that if a Build wants to leverage a persistent cache, it would mount it, e.g.

spec:
  steps:
  - image: super-builder:latest
    volumeMounts:
    - name: persistent-cache
      mountPath: /var/super-builder/.cache

  volumes:
  - name: persistent-cache
     # Fill in your favorite persistent volume.
     persistentVolumeClaim:
       claimName: mattmoor-cache

We can potentially use this in interesting ways that make caching optional, e.g.

=== BuildTemplate ===
spec:
  parameters:
  - name: CACHE
    description: The name of the volume to mount for caching artifacts.
    default: intra-build

  steps:
  - image: super-builder:latest
    volumeMounts:
    # Allow the user to override the volume we use as a cache.
    - name: "${CACHE}"
      mountPath: /var/super-builder/.cache

  volumes:
  # By default we provide intra-build caching via an emptyDir
  - name: intra-build
    emptyDir: {}

=== Build ===
spec:
  template:
    name: what-is-above
    arguments:
    - name: CACHE
      value: persistent-cache

  volumes:
  - name: persistent-cache
     # Fill in your favorite persistent volume.
     persistentVolumeClaim:
       claimName: mattmoor-cache

@bparees @sclevine @ImJasonH WDYT?

mattmoor avatar Feb 16 '18 18:02 mattmoor

It is notable that when choosing a persistent volume option to consider that the time it takes to attach that storage to the node may be non-zero.

mattmoor avatar Feb 16 '18 18:02 mattmoor

If a node had previously attached a PD for another pod which has since finished, is the PD still attached to the node for a future pod? Will k8s schedule another pod that wants that PD on that node? Or is the PD attached and detached each time to simplify scheduling?

On Fri, Feb 16, 2018 at 10:12 AM Matt Moore [email protected] wrote:

It is notable that when choosing a persistent volume option to consider that the time it takes to attach that storage to the node may be non-zero.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/google/build-crd/issues/52#issuecomment-366314625, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM3MeMb7B5AjSmJukr1nbZuPobHFy5rks5tVcUngaJpZM4SIjWG .

imjasonh avatar Feb 16 '18 18:02 imjasonh

@mattmoor yeah i think it's useful. it's a concept we've wanted to add to openshift builds for a while, basically two things have kept us from doing it:

  1. if you're running parallel builds, you need to be sure the PV you're mounting can be mounted read/write-many, and across multiple nodes simultaneously.

  2. for us, since we actually do the build steps in a container that k8s is unaware of (since we talk to the docker socket) we don't have a good way to actually make the PV accessible to the container we launched. Obviously you don't have that problem.

bparees avatar Feb 16 '18 18:02 bparees

@ImJasonH I have no sense for how smart the K8s scheduler is about persistent volumes.

@bparees Ack on the multi-write problem. I believe the "write-once" PVC is somewhat smart about this, and IIUC the Pod will sit as Pending until it can take the writer lock (definitely something to confirm!). It'd be fantastic if the scheduler were aware enough of the contention to colocate the pending Pod with the Pod running the mounted volume and elide the unmount/mount cost (at least). Another problem with multi-write is when build systems don't like to share.

For explicitly parallel builds (e.g. Matrix), having the concept of a PVC "pool" would be neat, where the write tenancy would be modeled as the pool size. I haven't bothered looking at whether this exists, since I can imagine very few workloads that might want that kind of abstraction. :)

mattmoor avatar Feb 16 '18 18:02 mattmoor

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale