source-controller icon indicating copy to clipboard operation
source-controller copied to clipboard

Add support for using an OCI image as source

Open nebhale opened this issue 2 years ago • 25 comments

This change defines an OCIRepository CRD that allows a user to specify a given image to use as a source. The contents of the image (including images with multiple layers) are converted into a TAR and exposed to consumers following the same conventions as the other source types.

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: OCIRepository
metadata:
  name: podinfo-deploy
  namespace: flux-system
spec:
  interval: 10m
  url: ghcr.io/stefanprodan/podinfo-deploy
  ref:
    # one of
    tag: "latest"
    digest: "sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2"
    semver: "1.x"
  auth:
    # one of
    secretRef:
      name: regcred
    serviceAccountName: reg

In addition to the url, interval, timeout, ignore, and suspend keys (all of which behave consistently with the existing source types) this CRD also defines authentication via both an image pull Secret and a service serviceAccountName which provide ways to contribute registry connection credentials for the specified image.

This change also adds a new way to write to the storage archive by streaming data from an incoming TAR without writing it to the filesystem. A couple of code and test functions were extracted to reuse common functionality for both archive strategies.


Formerly: This change defines an OCIImage CRD that allows a user to specify a given image to use as source. The contents of the image (including images with multiple layers) are converted into a TAR and exposed to consumers following the same conventions as the other source types.

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: OCIImage
metadata:
  name: ociimage-sample
spec:
  image:    index.docker.io/stefanprodan/podinfo:latest
  interval: 1m

In addition to the image, interval, timeout, ignore, and suspend keys (all of which behave consistently with the existing source types) this CRD also defines both imagePullSecrets and serviceAccountName keys which provide ways to contribute registry connection credentials for the specified image.

This change also adds a new way to write to the storage archive by streaming data from an incoming TAR without writing it to the filesystem. A couple of code and test functions were extracted to reuse common functionality for both archive strategies.


Note: The controller does not yet have any tests although I do plan to contribute them. I took a look at the gitrepository_controller_test.go. noticed that it made reference to a package in another repo and decided to ask for your advice on testing, first. Should I make a coordinated separate contribution there, add an internal package here, or something else?

nebhale avatar Oct 08 '21 21:10 nebhale

Thank you very much for this initial contribution @nebhale. We are at present in the process of bringing some standardization to the set of controllers, including better tests, and you may want to have a look at the reconcilers-dev branch to see how you can best structure the tests. Cheers!

hiddeco avatar Oct 09 '21 06:10 hiddeco

Hi @nebhale could you please take a look at my proposal for the API here: https://github.com/fluxcd/flux2/discussions/1705#discussioncomment-1153761

Here is an example of how I see the API:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: OCIRepository
metadata:
  name: podinfo-release-candidates
  namespace: flux-system
spec:
  interval: 10m
  image: ghcr.io/stefanprodan/podinfo-deploy
  secretRef:
    name: regcred
  filterTags:
    pattern: '.*-rc.*'
  policy:
    semver:
      range: '^1.x-0'

stefanprodan avatar Oct 09 '21 07:10 stefanprodan

@stefanprodan I'll update to match your proposal.

nebhale avatar Oct 11 '21 13:10 nebhale

I would rather see this as a self-contained controller. Sources should really be a point of extension (that is, people should be able to provide their own kinds of source as self-contained controllers, from which users can pick and choose). At present this isn't possible because the source types are hard-coded into the controllers that use sources. @nebhale Would you be willing to collaborate on a design for making sources extensible? I know @hiddeco has already thought a lot about it.

My other concern is that the API here ignores the existing API for specifying image repositories and selecting amongst the images therein: https://fluxcd.io/docs/components/image/imagepolicies/. The types there are for selecting container images for automating updates to workloads, so a different purpose (though it's conceivable you could automation updates to config images ...). Nonetheless a user might reasonably ask "Why can't I specify an image source using a semver range?" or just "Why are there two different schemes for referring to images?".

squaremo avatar Oct 11 '21 16:10 squaremo

@stefanprodan as I was implementing I was thinking about where policy should exist in the hierarchy and I was wondering what you thought about moving it under a filterTags level. This feels right because the policy only applies to the tag filtering where there are multiple candidates to choose from, while at the top level (digest and tag) there's only a single version. Some examples of the CR:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: ArtifactRepository
metadata:
  name: podinfo-release-candidates
  namespace: flux-system
spec:
  interval: 10m
  image: ghcr.io/stefanprodan/podinfo-deploy
  secretRef:
    name: regcred
  digest: sha256:d8ba3e9dc883b854a30c40ccdf6d6653b26868c7b77351d4c91eaffaf662611e
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: ArtifactRepository
metadata:
  name: podinfo-release-candidates
  namespace: flux-system
spec:
  interval: 10m
  image: ghcr.io/stefanprodan/podinfo-deploy
  secretRef:
    name: regcred
  tag: '1.0.0'
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: ArtifactRepository
metadata:
  name: podinfo-release-candidates
  namespace: flux-system
spec:
  interval: 10m
  image: ghcr.io/stefanprodan/podinfo-deploy
  secretRef:
    name: regcred
  filterTags:
    pattern: '^main-[a-fA-F0-9]+-(?P<ts>.*)'
    extract: '$ts'
    policy:
      numerical:
        order: asc
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: ArtifactRepository
metadata:
  name: podinfo-release-candidates
  namespace: flux-system
spec:
  interval: 10m
  image: ghcr.io/stefanprodan/podinfo-deploy
  secretRef:
    name: regcred
  filterTags:
    pattern: '.*-rc.*'
    policy:
      semver:
        range: '^1.x-0'

Also, what do you think about tagFilter which feels declarative instead of filterTags which feels imperative?

nebhale avatar Oct 11 '21 22:10 nebhale

@squaremo I'd love to start engaging on the idea of source extensibility. We've got some experience in this subject and I think we can bring some alternative view points to the table.

nebhale avatar Oct 12 '21 18:10 nebhale

@squaremo I'd love to start engaging on the idea of source extensibility.

Aces! @hiddeco Do you have a preferred venue? I think it's worth aiming at producing a design document, ultimately.

squaremo avatar Oct 13 '21 14:10 squaremo

Either a hackmd.io document, or Google document would work for me (I probably need to create the outline myself, which means I also need some time). In short @nebhale:


What I envision is the Artifact becoming a separate well-known (Custom Resource) entity. Domain specific "producers" would create and maintain their set of Artifact resources (including ownerRef, garbage collection rules, etc.).

Consumers would only be aware of the Artifact type (and probably some label contract to allow for label selector queries, as I think owner references still can't be used to List).

This would allow for "wilcard" sourceRef entries, which would query the Artifact resources based on the label contract combined with the data from the ref.


Is this inline with your experience on the subject, or do you have an alternative model in mind that may be a better fit?

hiddeco avatar Oct 14 '21 16:10 hiddeco

@hiddeco I don't think I quite grasp what you're driving at with your description. Do you have an example of what concrete resources you'd expect the user to create and then to have resulted when reconciliation is complete? I guess I just don't quite grasp the relationship between "Domain specific 'producers'" and the Artifact resource.

nebhale avatar Oct 15 '21 16:10 nebhale

Domain specific 'producers' are things like GitRepository, Bucket, etc. resources.

At present the Artifact is embedded into these resources, which results in a requirement of the consumer (e.g. the kustomize-controller) being aware of all possible types. If these resources would just produce other Artifact objects, the consumer only has to be aware of the Artifact resource, and how they can be found based on a typed reference.

A user would still just create a GitRepository, and refer to it somewhere else (e.g. a Kustomization).

An additional thing this would allow is a historical list of artifacts for an object (because there now is a one-to-many possibility, opposed to a one-to-one). Which could be an enabler for more advanced use-cases (rollbacks, diffs, ..?).

hiddeco avatar Oct 15 '21 16:10 hiddeco

I don't think that we currently have the issue that this is trying to address because we treat the the resource as a duck-type (we call it "Latest Source" but now that I've been in the Flux code, "Artifact" is the accurate name). When it comes time to consume any of the resource types currently exposed by the controller (or the ones that we've written directly), we don't actually deserialize into their concrete Go structs, but rather do a partial deserialization of the resource into our "Duck struct". The pattern is analogous to Go's consumer-side interfaces and doesn't require us to be aware of all possible types.

For our usage, we prefer both creation of the resource and the return of Artifact in the status of that same resource because we consider them intrinsically linked. Our use-cases don't really gain an advantage by using the "producer resources" as a point of decoupling and if Flux changed to the proposed design, we'd end up using cluster-unique label values to (sub-optimally) replicate the ObjectReference-ability that we currently have today.

nebhale avatar Oct 15 '21 16:10 nebhale

The historical list of artifacts angle is a good one and probably quite useful to add to the domain resources. I think there are some positive analogs with kpack's Image/Build relationship. What stands out to me though is that the "latest" build coordinate is still always available in the Image so that a user can refer to the Image both for creation and status updates if they don't care about the individual builds involved.

Addition of Artifact resources for your describe uses-case seems useful, but I wouldn't want to also remove them from the domain resources.

nebhale avatar Oct 15 '21 16:10 nebhale

I don't think that we currently have the issue that this is trying to address

@nebhale There are two related issues (but you can have one without the other):

  • the kinds that are allowed in a source reference are hard-coded, meaning you cannot add a new kind of source and expect e.g., a Kustomization resource to be able to refer to it;
  • the consumer controllers must watch all the known source types to see when something changes, and if you add a new kind of source, all the controllers won't know about it.

You can avoid the first in your own consumer controller because you're in charge of the custom type and you can just use a reference without a constraint on the kind. I don't know how you avoid the second problem.

squaremo avatar Oct 18 '21 08:10 squaremo

@squaremo A good example of this is in our implementation of the Service Binding for Kubernetes specification. In that, there is a ServiceBinding resource type that uses a LocalObjectReference to refer to a resource of any kind. The only requirement for that resource is that it have a .status.binding.name that points to a Secret (the Provisioned Service duck type). At no point does the Service Binding controller know anything about the concrete GVK of the workload specified in the type, including as it watches for changes to the resource. Any user can create any conformant resource and we can work against it without foreknowledge of the type.

I believe (and to be clear I've not looked deeply into Flux around this point) this same effect can be achieved for the two points that you've raised. In both cases, the controllers that consume a "Latest Artifact" conformant resource needn't know about any resource type whatsoever. Instead, they just use a LocalObjectReference to get/watch a resource and when the JSON payload comes in, instead of deserializing into the concrete types (what you've needed to hardcode so far) they deserialize into a (forgive the freehand Go)

struct LatestArtifact {
    Status struct {
        Artifact Artifact `json:"artifact"`
    } `json:"status"`
}

Since this is the contract of the thing you're trying to consume, you can safely ignore every other part of the resource payload and never know about any of it.

nebhale avatar Oct 18 '21 15:10 nebhale

Instead, they just use a LocalObjectReference to get/watch a resource

This is the tricky bit: controller-runtime is very keen that you create watches up-front, by type. I'm sure it's possible to create and discard watches dynamically (it's all the same client-go machinery underneath), you're just on your own somewhat.

My concern is about having extensibility; modulo complexity, I don't mind too much whether that's accomplished with exact types (via Artifact) or using duck-typing. @hiddeco wdyt?

squaremo avatar Oct 18 '21 16:10 squaremo

If users can install Flux source-extensions with their own unique GVK's that implement a Duck Type:

  • the consumer-controllers' watches can be built with a label selector on CRD's OR added just-in-time when there are sourceRef's that request a particular GVK.
  • the consumer-controllers' ServiceAccount RBAC can be extended using ClusterRole Aggregation: https://kubernetes.io/docs/reference/access-authn-authz/rbac/#aggregated-clusterroles

Important to think about whether source-extensions will use their own apiGroup, because that does add another string for a user to lookup/specify/ensure-is-correct which can affect usability.

CrossNamespaceSourceReference already includes an optional apiVersion field, but nobody currently needs to use that in-practice, and I'm actually not sure what the behavioral consequence of setting it currently is: https://fluxcd.io/docs/components/kustomize/api/#kustomize.toolkit.fluxcd.io/v1beta2.CrossNamespaceSourceReference

stealthybox avatar Oct 18 '21 20:10 stealthybox

It's great to be thinking forward towards extensibility. However, do we want OCI images to be supported in the core API's?

I imagine this will be used more often than Bucket sources due to:

  • availability of an image registry when using Kubernetes
  • metadata and versioning when using image repos vs. buckets
  • image signing workflows such as https://www.sigstore.dev/
  • tooling that several orgs (VMware, Azure, RedHat) are building around RegOps

stealthybox avatar Oct 18 '21 20:10 stealthybox

"Why can't I specify an image source using a semver range?" or just "Why are there two different schemes for referring to images?".

@squaremo It's worth noting this is already sort of inconsistent. You can't track non-semver tag changes on GitRepositories, but you can on ImagePolicies.

I've often hoped ImagePolicyChoice and TagFilter would make their way into all of the source objects at some point.

https://fluxcd.io/docs/components/source/gitrepositories/ vs. https://fluxcd.io/docs/components/image/imagepolicies/

stealthybox avatar Oct 18 '21 21:10 stealthybox

However, do we want OCI images to be supported in the core API's?

I can only speak for myself but I think we should ship OCIRepository in the v1beta1 of the source API and not wait months for the artifact decoupling.

stefanprodan avatar Oct 19 '21 06:10 stefanprodan

I propose we include OCI in the v1beta1 API in a way that fits with the current definitions. Later on we can add tag filters and ordering capabilities to both Git and OCI sources, decouple the artifacts and so on.

Here is my proposal:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: OCIRepository
metadata:
  name: podinfo-deploy
  namespace: flux-system
spec:
  interval: 10m
  url: ghcr.io/stefanprodan/podinfo-deploy
  ref:
    # one of
    tag: "latest"
    digest: "sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2"
    semver: "1.x"
  auth:  
    # one of
    secretRef:
      name: regcred
    serviceAccountName: reg
  verify:
    secretRef:
      name: sigstore-keys

The verify field (like for GitRepositories) is optional and the implementation could come as followup PR. I would hold on releasing OCIRepository until the above spec is fully implemented including unit tests.

stefanprodan avatar Oct 19 '21 11:10 stefanprodan

Revisiting my initial comments:

I would rather see this as a self-contained controller.

I would like to see the source protocol opened up to extension, but it doesn't have to hold this PR hostage. I do buy the argument made by Leigh that if you support buckets as a "core" type, you should probably support OCI artifacts.

Let's move the architectural discussion to another venue -- in fact, I would like to see a more formal (RFC?) process for big design changes, Leigh's design for using impersonation, and opening up the source protocol, are both good candidates for RFCs.

My other concern is that the API here ignores the existing API for specifying image

I have reconciled myself to that ship having sailed for the v1 API 😭 Changing this is going to be a longer process, and certainly making the APIs consistent will have to wait for a v2(alpha1). Of course, once the source protocol is opened up, there's nothing stopping a third party from making their own controllers and APIs.

In sum: I'm persuaded that getting this new kind of source into users' hands takes priority over redesigning APIs.

Re the (second) schema suggestion from Stefan: this looks right to me.

Putting your configs in OCI artifacts is going to be different to using git (because you'll probably have to use some CI to build them, for one thing). Nonetheless, following a tag (yuck, but people do it), or using a specific digest, or following a semver range is a decent set of alternatives to start with. I suspect people will think of config artifacts in much the same way as they think of container images, so I'd expect users to ask for the same flexibility with filtering and ordering that ImagePolicy has. But you have to start somewhere.

I don't think it's necessary to implement verification (or to have it in the API) before making the new kind of source available to people, if that's going to hold things up. The arguments for deferring filters, not reworking the API, etc. etc., can also be directed at verification. (Ultimately, the release schedule is a matter for the source-controller maintainers, though.)

squaremo avatar Oct 20 '21 10:10 squaremo

@stefanprodan, @squaremo, and @hiddeco thanks for the lively discussion and I'm looking forward to being involved as Flux evolves. I also deeply appreciate the pragmatic approach to getting the OCI Image functionality in place and then iterating towards a more generalized design across the whole system. Please look out for a concrete PR shortly with proposed changes and testing included.

nebhale avatar Oct 20 '21 18:10 nebhale

Late to the party, but thanks for the valuable insights @nebhale, this has been really fruitful to get more angles of what extensibility would look like, and how it would serve different usages so that we can ensure we evolve things into something that suits (almost) all.

I am looking forward to the concrete PR, and (hopefully) further collaborations :sunflower::100:

hiddeco avatar Oct 20 '21 19:10 hiddeco

I have prepared a repo for e2e tests and demos at ghcr.io/stefanprodan/podinfo-deploy with latest, 6.0.1 and 6.0.0 tags. After a podinfo release, an immutable tag is created for semver, then the latest moves to that tag.

The image contains the following manifests (which are taken from here):

$ crane export ghcr.io/stefanprodan/podinfo-deploy:latest -| tar -tvf - 
-rw-r--r--  0 0      0        1713 Oct 21 11:44 deployment.yaml
-rw-r--r--  0 0      0         419 Oct 21 11:44 hpa.yaml
-rw-r--r--  0 0      0         126 Oct 21 11:44 kustomization.yaml
-rw-r--r--  0 0      0         271 Oct 21 11:44 service.yaml

$ crane export ghcr.io/stefanprodan/podinfo-deploy:6.0.1 -| tar -Oxf - deployment.yaml | grep ghcr
        image: ghcr.io/stefanprodan/podinfo:6.0.1

$ crane export ghcr.io/stefanprodan/podinfo-deploy:6.0.0 -| tar -Oxf - deployment.yaml | grep ghcr
        image: ghcr.io/stefanprodan/podinfo:6.0.0

The images are signed with cosign:

$ cat cosign.pub
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEST+BqQ1XZhhVYx0YWQjdUJYIG5Lt
iz2+UxRIqmKBqNmce2T+l45qyqOs99qfD7gLNGmkVZ4vtJ9bM7FxChFczg==
-----END PUBLIC KEY-----

$ cosign verify --key cosign.pub ghcr.io/stefanprodan/podinfo-deploy

Verification for ghcr.io/stefanprodan/podinfo-deploy --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - The signatures were verified against the specified public key
  - Any certificates were verified against the Fulcio roots.
{"critical":{"identity":{"docker-reference":"ghcr.io/stefanprodan/podinfo-deploy"},"image":{"docker-manifest-digest":"sha256:2c70b816cf4213db92d1d95206aea5b79fa7d59d56fd7a4186c5d9b5b4c3f120"},"type":"cosign container image signature"},"optional":null}

Using the above image we could swap the podinfo GitRepository with an OCIRepository like so:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: OCIRepository
metadata:
  name: podinfo
  namespace: flux-system
spec:
  interval: 10m
  url: ghcr.io/stefanprodan/podinfo-deploy
  ref:
    tag: "latest"
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: podinfo
  namespace: flux-system
spec:
  interval: 5m0s
  path: "./"
  prune: true
  sourceRef:
    kind: OCIRepository
    name: podinfo
  targetNamespace: default

Currently I'm using crane to build the OCI atifact but ideally this will be incorporated into flux CLI e.g.

flux bundle -f ./kustomize -t ghcr.io/stefanprodan/podinfo-deploy:6.0.1 -s cosign.key

stefanprodan avatar Oct 21 '21 08:10 stefanprodan

Will update this PR. RFC https://github.com/fluxcd/flux2/pull/2601 pending

rashedkvm avatar Apr 11 '22 15:04 rashedkvm

Superseded by https://github.com/fluxcd/source-controller/pull/788

stefanprodan avatar Sep 22 '22 11:09 stefanprodan