community Transparent resource adoption

Is your feature request related to a problem?

It seems that with AdoptedResources, I must know ahead of time if the resource within AWS already exists and make a choice based on this:

if the resource does already exists within AWS, I must specify an AdoptedResource to adopt it;
but if the resource does not exist within AWS, I must specify a new ack resource (e.g. an IAM policy resource) to create it.

This makes declarative, idempotent GitOps difficult or impossible if I do not already know the full state of the target environment. For example if I am tearing down and rebuilding environments rapidly (which I do during development), some resources may not be cleaned-up properly during teardown and so I would have to use an AdoptedResource when rebuilding. Others will have been destroyed correctly and so I would have to specify the actual ack resource (e.g. an IAM policy resource) to recreate it.

Describe the solution you'd like

I would like the option for an ack resource (e.g. an IAM policy resource) to transparently create or adopt the AWS resource in the target environment. If the AWS resource already exists, adopt it and remediate it to match the desired state. If the AWS resource does not already exist, create it.

So this is non-breaking, I suggest a new annotation for all ack resources:

annotations:
  services.k8s.aws/force_adoption: "true"

The default should be "false" (current behaviour), but when set to true, defined ack resources will automatically start managing existing AWS resources if they already exist.

Describe alternatives you've considered

Can't think of anything. I've tried everything I can thing of with the current AdoptedResources functionality, but I can't get it to do what I want.

Jul 04 '22 08:07 liger1978

TL;DR:

Hi @liger1978, thank you for bringing this to our attention and proposing a creative solution. However I believe that if ACK deletion logic works as expected, your stack should be GitOps compliant without the need to adopt any resources.

Any resource adoption, if needed, shall be a one time thing, and then the generated ACK resource from the adoption can be managed in a GitOps fashion.

Original Problem

some resources may not be cleaned-up properly during teardown

How are you performing the cleanup? Manually deleting resources from AWS or deleting the K8s resource manifests. If it's the later and there is a bug in ACK resource deletion do let us know.

And during teardown wouldn't you wanna make sure that all the resources were successfully deleted before recreating the stack? Why reuse old resources that were meant to be deleted?

Overall GitOps Experience

If you started with no adopted resources and were creating whole stack using ACK resources, I would rather fix the bugs in deletion logic and make sure idempotent GitOps experience is achievable that way.

Current Adopted Resource GitOps Experience

Currently If an adopted resource is present in the GitOps manifest, it will create an ACK resource that is not present in the GitOps manifests. I can imagine a 2 step process to be completely GitOps compliant. First you add AdoptedResource to the GitOps manifests which will create an ACK resource, and then after successful adoption, you replace AdoptedResource inside GitOps manifests with actual ACK resource manifest.

Once the resource is adopted, and manifests are updated, the resulting manifests will be GitOps complaint. And during tear-down + recreation, you will not need to adopt the resource again.

Jul 05 '22 17:07 vijtrip2

How are you performing the cleanup? Manually deleting resources from AWS or deleting the K8s resource manifests.

In some cases the managing EKS cluster is destroyed and rebuilt. This is typically the situation that causes the ack-managed resources in AWS to be orphaned in our case. When the cluster is rebuilt and processes its GitOps repo, the reinstalled ack controllers attempts to recreate the orphaned resources and fails as they already exist. There is no bug AFAIK.

And during teardown wouldn't you wanna make sure that all the resources were successfully deleted before recreating the stack. Why reuse old resources that were meant to be deleted?

Not when we tear down the managing cluster. We typically don't want to destroy the AWS resources the cluster manages with the ack controllers, just the cluster itself.

Jul 05 '22 18:07 liger1978

@liger1978 , gotcha! Thanks for providing more context.

Jul 05 '22 18:07 vijtrip2

@vijtrip2 No problem. Another scenario, this time in production instead of development, is geographic failover of our EKS management cluster.

Management cluster cluster1 in us-east-1 goes down due to a regional outage, so we quickly spin up cluster2 in us-west-1 pointed at the same GitOps repo. We would like all the existing ack resources defined in the repo to be automatically adopted when the new cluster takes over their management.

Jul 05 '22 18:07 liger1978

@liger1978 Thanks for bringing this use-case to our attention. This annotation strategy was discussed during the design of the AdoptedResource. We haven't entirely written it off, but there are some other issues with it that I wanted to let you know.

Not all resources can be defined using the properties in their spec. Many AWS resources use an auto-generated name (such as the EC2 instance ID), and then require that all references are made using this name or the ARN. In those cases, there is no combination of spec fields that would properly define which existing resource to adopt. The controller would treat every new K8s custom resource as a new, separate object, leaving the existing resources hanging. This was the biggest reason we went with a separate CRD for adoption, so that we didn't need to modify the spec fields for any existing resources.

Secondly, although the spec of a K8s CR should define the full desired state of a resource, most of the time a user will only provide a partial spec for an ACK resource and rely on the AWS service to fill in the defaults. That is, you probably aren't going to use every single field in every single ACK custom resource, but instead rely on the fact that the AWS service will use the default values for anything left undefined. In those cases, the ACK controller persists the server-side defaults back into the spec of the object so that the next reconciliation loop understands what the default values are - and therefore whether to attempt to override them (if modified). When you adopt an existing AWS resource using an annotation on a partially defined ACK resource, the K8s controller cannot know what the server-side default value is. Most of the time, these defaults are only returned when we create the resource for the first time, so if we submit an undefined value as part of a subsequent Modify* call to the service, it could simply return an error. Therefore, because we aren't handling the full lifecycle of every object, we can't guarantee it would match the expected configuration of one created through ACK.

I apologise for the large paragraphs of text, but I realise these nuances have not been explained anywhere else in the design documents or online documentation for the adopted resource CRDs. I would love to provide an annotation to allow resource adoption in the way you described, but I worry that these cases (admittedly they are edge cases) could end up causing a confusing user experience. That is not to say we won't ever support functionality like this - GitOps compliance is incredibly important to the project and suggestions provide important insights into how the controllers are being used

Jul 05 '22 22:07 RedbackThomson

@liger1978 Thanks for bringing this use-case to our attention.

@RedbackThomson No problem!

For clarity, I am henceforth going to refer to my proposed process of managing existing AWS resources as "absorption" to make it distinct from your existing adoption process.

Not all resources can be defined using the properties in their spec. Many AWS resources use an auto-generated name (such as the EC2 instance ID), and then require that all references are made using this name or the ARN. In those cases, there is no combination of spec fields that would properly define which existing resource to adopt. The controller would treat every new K8s custom resource as a new, separate object, leaving the existing resources hanging.

OK, a couple options for absorption here:

Only successfully absorb by spec.name where the name is a unique identifier of an existing AWS resource. If the name does not resolve to an existing unique AWS resource, then create a new one. This will work fine for some resources, but as you have noted, will never allow absorption of resources like EC2 instances where names are generated. This will still be useful for many use cases.
Absorb by ARN. This is useful where the generated ARN is predictable and is similar to option 1, but will remove any ambiguity at all about what will be absorbed, e.g.:

metadata:
  annotations:
    services.k8s.aws/absorb_existing: "true"
    services.k8s.aws/absorb_match_arn: "arn:aws:iam::123456789012:policy/my-policy"

Absorb by specified AWS tags. If a search based on the tags resolves to a single AWS resource, then the controller has successfully found the resource to absorb. If not, then it will create a new one, e.g.:

metadata:
  annotations:
    services.k8s.aws/absorb_existing: "true"
    services.k8s.aws/absorb_match_tags: "role=bastion_server,env=dev"

Option 2 is likely the easiest to implement and has the least ambiguity about what is going to happen. It would suit us as things stand. Option 3 would cover more resources types and it is possible we will require it in future, in addition to option 2, as we manage more resource types with ack.

Secondly, although the spec of a K8s CR should define the full desired state of a resource, most of the time a user will only provide a partial spec for an ACK resource and rely on the AWS service to fill in the defaults. That is, you probably aren't going to use every single field in every single ACK custom resource, but instead rely on the fact that the AWS service will use the default values for anything left undefined. In those cases, the ACK controller persists the server-side defaults back into the spec of the object so that the next reconciliation loop understands what the default values are - and therefore whether to attempt to override them (if modified). When you adopt an existing AWS resource using an annotation on a partially defined ACK resource, the K8s controller cannot know what the server-side default value is. Most of the time, these defaults are only returned when we create the resource for the first time, so if we submit an undefined value as part of a subsequent Modify* call to the service, it could simply return an error.

I would be perfectly happy with this. The k8s resource would be in error state and hopefully the API would return the missing value which would display in the status field and the controller logs. If I intend to use absorption, it is up to me to fully specify my resource in my k8s spec. An appropriate caveat emptor can be added to the docs and as long as the error in the k8s resource status field is clear, it should be easy to see what is up and add the missing values to the spec.

I apologise for the large paragraphs of text, but I realise these nuances have not been explained anywhere else in the design documents or online documentation for the adopted resource CRDs. I would love to provide an annotation to allow resource adoption in the way you described, but I worry that these cases (admittedly they are edge cases) could end up causing a confusing user experience. That is not to say we won't ever support functionality like this - GitOps compliance is incredibly important to the project and suggestions provide important insights into how the controllers are being used

You have nothing to apologise for. I appreciate the quick and thoughtful responses from the ack team! I understand the original design decisions, but as things stand we can't really do idempotent GitOps where our target environments are remediated to match the desired state in the repo.

Jul 06 '22 08:07 liger1978

Actually I have been mincing this over for a while and I'm starting to come back around on this idea.

I'd say the main use-case for the AdoptedResource custom resource was to support users who were previously using other tools (CloudFormation, Terraform) without requiring them to rewrite all of their definitions with ACK. They would not be able to use your annotation-based solution - they would not be able to provide the bare minimum required fields to create the custom resource. Instead, they would apply a set of AdoptedResource objects with the names (as exported from their current tooling) and then be able to download the YAML for future reference.

However, you're offering a different situation, wherein a user already has fully-formed manifests, created elsewhere, that wants to continue reconciliation in a new context. Apart from the edge-cases I identified previously, there isn't anything fundamentally wrong with that.

Jul 06 '22 19:07 RedbackThomson

Similar disaster recovery use case from EBS CSI driver: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1160

Jul 07 '22 21:07 vijtrip2

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

Oct 06 '22 04:10 eks-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle rotten

Nov 05 '22 04:11 eks-bot

/lifecycle frozen

Dec 02 '22 15:12 a-hilaly

I suggest that adopting existing resources by tag is enough.

At least by default at the moment, there are tags:

services.k8s.aws/controller-version exists with the form %CONTROLLER_SERVICE%-%CONTROLLER_VERSION%
services.k8s.aws/namespace exists with the form %K8S_NAMESPACE%

If you additionally had tags with the kubernetes resource name, e.g.

services.k8s.aws/name existed with the form %K8S_RESOURCE_NAME%

Then you should have all you need to link a resource back to it's recreated kubernetes form.

Feb 15 '23 05:02 james-callahan

have you got any update on this issue?

May 25 '23 08:05 tomitesh

community community copied to clipboard

Transparent resource adoption

community
community copied to clipboard