[SURE-7844] Spike: Use OCI repo to store bundle resources
What problems do we face, when we try to use an OCI registry as storage for bundle resources?
Background
The bundle creation process consists of three steps: Fleet converts resources, like YAML manifests, kustomize and helm charts to “bundle” resources. These bundle resources are copied to bundledeployment/content resources, this is called “targeting”. The bundledeployment/content resources are downloaded via the k8s API to target clusters and converted into a helm chart, which is then installed. This corresponds to the “apply”, “target” and “deploy” CLI commands. See also: https://fleet.rancher.io/ref-bundle-stages
The bundle resource contains all the resources from the input folders and its size is limited by the etcd storage size. Since bundles can contain multiple helm charts, even multiple versions of the same helm chart, it’s possible to create bundles that are too big to be stored in k8s. In that case, Fleet will error out and the user will need to find a way to make the bundle smaller, e.g. by splitting it. Fleet supports compressed, base64 encoded output, but that’s only a mitigation for the underlying problem: large resources are stored in etcd via the k8s API.
Note, since Fleet uses the Helm SDK with the secret backend, there is an additional size limit for bundles. However, the size of secrets should be large enough for most helm charts, especially since it only contains deployed resources.
Spike
The bundle resource will have an empty resource list and a reference to the OCI repository server. The bundledeployment will not point to a content resource, but to an OCI repository server instead.
When a Fleet operator has their own OCI repository server, they can configure Fleet bundles to use the server, instead of storing data in k8s. Fleet will still use the bundle and bundledeployment resources, e.g. for targeting clusters, but the bundle’s resources: list will be empty and the bundledeployment’s deploymentID will not point to a content resource. The bundle data is stored in the OCI registry only once, not twice (bundle.resources and content) as before.
Fleet will store the resource list as an OCI object under a given URL, e.g.
localhost:5000/fcontent/s-RANDOM_ID:v1
The object consists of the bundle resource data and an OCI manifest. Fleet won’t use the version number (?).
Fleet should support authenticated connections to the OCI registry.
Out of scope:
- automatic installation and set up of the OCI regstry
- garbage collection
- downstream agents without access to the registry, air gap, etc.
In scope:
- [ ] can we rely on a random repository name to isolate workloads per agent? Like we rely on a random content resource name now?
- [ ] how do OCI registries handle access controls? For example https://zotregistry.dev/v2.0.1/articles/authn-authz/#example-access-control-configuration.
- [ ] can we support an installation wide default registry and override it per cluster?
- [ ] can/should we sync credentials to downstream cluster's secrets automatically? normally fleet doesn't.
Pretty sure that this is the problem here https://github.com/rancher/fleet/issues/2388
I've created https://github.com/rancher/fleet/issues/2465 to add a feature flag to enable this as experimental. Once that is implemented we could try to merge https://github.com/rancher/fleet/pull/2375
can we rely on a random repository name to isolate workloads per agent? Like we rely on a random content resource name now?
Yes, for the tested registries (zot, dockerhub) we can.
how do OCI registries handle access controls? For example https://zotregistry.dev/v2.0.1/articles/authn-authz/#example-access-control-configuration.
Basic-auth.
can we support an installation wide default registry and override it per cluster?
We could, but going for "per gitrepo" first.
can/should we sync credentials to downstream cluster's secrets automatically? normally fleet doesn't.
Making this experimental and opening a security assessment request.
Thx, we just ran into the issue of the kubernetes resource limit for the bundle (about 100 overlays in 1 bundle). So we're very looking forward to this. ;)