community
community copied to clipboard
Transparent resource adoption
Is your feature request related to a problem?
It seems that with AdoptedResources
, I must know ahead of time if the resource within AWS already exists and make a choice based on this:
- if the resource does already exists within AWS, I must specify an
AdoptedResource
to adopt it; - but if the resource does not exist within AWS, I must specify a new ack resource (e.g. an IAM policy resource) to create it.
This makes declarative, idempotent GitOps difficult or impossible if I do not already know the full state of the target environment. For example if I am tearing down and rebuilding environments rapidly (which I do during development), some resources may not be cleaned-up properly during teardown and so I would have to use an AdoptedResource
when rebuilding. Others will have been destroyed correctly and so I would have to specify the actual ack resource (e.g. an IAM policy resource) to recreate it.
Describe the solution you'd like
I would like the option for an ack resource (e.g. an IAM policy resource) to transparently create or adopt the AWS resource in the target environment. If the AWS resource already exists, adopt it and remediate it to match the desired state. If the AWS resource does not already exist, create it.
So this is non-breaking, I suggest a new annotation for all ack resources:
annotations:
services.k8s.aws/force_adoption: "true"
The default should be "false"
(current behaviour), but when set to true, defined ack resources will automatically start managing existing AWS resources if they already exist.
Describe alternatives you've considered
Can't think of anything. I've tried everything I can thing of with the current AdoptedResources
functionality, but I can't get it to do what I want.
TL;DR:
Hi @liger1978, thank you for bringing this to our attention and proposing a creative solution. However I believe that if ACK deletion logic works as expected, your stack should be GitOps compliant without the need to adopt any resources.
Any resource adoption, if needed, shall be a one time thing, and then the generated ACK resource from the adoption can be managed in a GitOps fashion.
Original Problem
some resources may not be cleaned-up properly during teardown
How are you performing the cleanup? Manually deleting resources from AWS or deleting the K8s resource manifests. If it's the later and there is a bug in ACK resource deletion do let us know.
And during teardown wouldn't you wanna make sure that all the resources were successfully deleted before recreating the stack? Why reuse old resources that were meant to be deleted?
Overall GitOps Experience
If you started with no adopted resources and were creating whole stack using ACK resources, I would rather fix the bugs in deletion logic and make sure idempotent GitOps experience is achievable that way.
Current Adopted Resource GitOps Experience
Currently If an adopted resource is present in the GitOps manifest, it will create an ACK resource that is not present in the GitOps manifests. I can imagine a 2 step process to be completely GitOps compliant. First you add AdoptedResource to the GitOps manifests which will create an ACK resource, and then after successful adoption, you replace AdoptedResource inside GitOps manifests with actual ACK resource manifest.
Once the resource is adopted, and manifests are updated, the resulting manifests will be GitOps complaint. And during tear-down + recreation, you will not need to adopt the resource again.
How are you performing the cleanup? Manually deleting resources from AWS or deleting the K8s resource manifests.
In some cases the managing EKS cluster is destroyed and rebuilt. This is typically the situation that causes the ack-managed resources in AWS to be orphaned in our case. When the cluster is rebuilt and processes its GitOps repo, the reinstalled ack controllers attempts to recreate the orphaned resources and fails as they already exist. There is no bug AFAIK.
And during teardown wouldn't you wanna make sure that all the resources were successfully deleted before recreating the stack. Why reuse old resources that were meant to be deleted?
Not when we tear down the managing cluster. We typically don't want to destroy the AWS resources the cluster manages with the ack controllers, just the cluster itself.
@liger1978 , gotcha! Thanks for providing more context.
@vijtrip2 No problem. Another scenario, this time in production instead of development, is geographic failover of our EKS management cluster.
Management cluster cluster1 in us-east-1 goes down due to a regional outage, so we quickly spin up cluster2 in us-west-1 pointed at the same GitOps repo. We would like all the existing ack resources defined in the repo to be automatically adopted when the new cluster takes over their management.
@liger1978
Thanks for bringing this use-case to our attention. This annotation strategy was discussed during the design of the AdoptedResource
. We haven't entirely written it off, but there are some other issues with it that I wanted to let you know.
Not all resources can be defined using the properties in their spec. Many AWS resources use an auto-generated name (such as the EC2 instance ID), and then require that all references are made using this name or the ARN. In those cases, there is no combination of spec fields that would properly define which existing resource to adopt. The controller would treat every new K8s custom resource as a new, separate object, leaving the existing resources hanging. This was the biggest reason we went with a separate CRD for adoption, so that we didn't need to modify the spec fields for any existing resources.
Secondly, although the spec of a K8s CR should define the full desired state of a resource, most of the time a user will only provide a partial spec for an ACK resource and rely on the AWS service to fill in the defaults. That is, you probably aren't going to use every single field in every single ACK custom resource, but instead rely on the fact that the AWS service will use the default values for anything left undefined. In those cases, the ACK controller persists the server-side defaults back into the spec of the object so that the next reconciliation loop understands what the default values are - and therefore whether to attempt to override them (if modified). When you adopt an existing AWS resource using an annotation on a partially defined ACK resource, the K8s controller cannot know what the server-side default value is. Most of the time, these defaults are only returned when we create the resource for the first time, so if we submit an undefined value as part of a subsequent Modify*
call to the service, it could simply return an error. Therefore, because we aren't handling the full lifecycle of every object, we can't guarantee it would match the expected configuration of one created through ACK.
I apologise for the large paragraphs of text, but I realise these nuances have not been explained anywhere else in the design documents or online documentation for the adopted resource CRDs. I would love to provide an annotation to allow resource adoption in the way you described, but I worry that these cases (admittedly they are edge cases) could end up causing a confusing user experience. That is not to say we won't ever support functionality like this - GitOps compliance is incredibly important to the project and suggestions provide important insights into how the controllers are being used
@liger1978 Thanks for bringing this use-case to our attention.
@RedbackThomson No problem!
For clarity, I am henceforth going to refer to my proposed process of managing existing AWS resources as "absorption" to make it distinct from your existing adoption process.
Not all resources can be defined using the properties in their spec. Many AWS resources use an auto-generated name (such as the EC2 instance ID), and then require that all references are made using this name or the ARN. In those cases, there is no combination of spec fields that would properly define which existing resource to adopt. The controller would treat every new K8s custom resource as a new, separate object, leaving the existing resources hanging.
OK, a couple options for absorption here:
- Only successfully absorb by
spec.name
where the name is a unique identifier of an existing AWS resource. If the name does not resolve to an existing unique AWS resource, then create a new one. This will work fine for some resources, but as you have noted, will never allow absorption of resources like EC2 instances where names are generated. This will still be useful for many use cases. - Absorb by ARN. This is useful where the generated ARN is predictable and is similar to option 1, but will remove any ambiguity at all about what will be absorbed, e.g.:
metadata:
annotations:
services.k8s.aws/absorb_existing: "true"
services.k8s.aws/absorb_match_arn: "arn:aws:iam::123456789012:policy/my-policy"
- Absorb by specified AWS tags. If a search based on the tags resolves to a single AWS resource, then the controller has successfully found the resource to absorb. If not, then it will create a new one, e.g.:
metadata:
annotations:
services.k8s.aws/absorb_existing: "true"
services.k8s.aws/absorb_match_tags: "role=bastion_server,env=dev"
Option 2 is likely the easiest to implement and has the least ambiguity about what is going to happen. It would suit us as things stand. Option 3 would cover more resources types and it is possible we will require it in future, in addition to option 2, as we manage more resource types with ack.
Secondly, although the spec of a K8s CR should define the full desired state of a resource, most of the time a user will only provide a partial spec for an ACK resource and rely on the AWS service to fill in the defaults. That is, you probably aren't going to use every single field in every single ACK custom resource, but instead rely on the fact that the AWS service will use the default values for anything left undefined. In those cases, the ACK controller persists the server-side defaults back into the spec of the object so that the next reconciliation loop understands what the default values are - and therefore whether to attempt to override them (if modified). When you adopt an existing AWS resource using an annotation on a partially defined ACK resource, the K8s controller cannot know what the server-side default value is. Most of the time, these defaults are only returned when we create the resource for the first time, so if we submit an undefined value as part of a subsequent
Modify*
call to the service, it could simply return an error.
I would be perfectly happy with this. The k8s resource would be in error state and hopefully the API would return the missing value which would display in the status
field and the controller logs. If I intend to use absorption, it is up to me to fully specify my resource in my k8s spec. An appropriate caveat emptor can be added to the docs and as long as the error in the k8s resource status field is clear, it should be easy to see what is up and add the missing values to the spec.
I apologise for the large paragraphs of text, but I realise these nuances have not been explained anywhere else in the design documents or online documentation for the adopted resource CRDs. I would love to provide an annotation to allow resource adoption in the way you described, but I worry that these cases (admittedly they are edge cases) could end up causing a confusing user experience. That is not to say we won't ever support functionality like this - GitOps compliance is incredibly important to the project and suggestions provide important insights into how the controllers are being used
You have nothing to apologise for. I appreciate the quick and thoughtful responses from the ack team! I understand the original design decisions, but as things stand we can't really do idempotent GitOps where our target environments are remediated to match the desired state in the repo.
Actually I have been mincing this over for a while and I'm starting to come back around on this idea.
I'd say the main use-case for the AdoptedResource
custom resource was to support users who were previously using other tools (CloudFormation, Terraform) without requiring them to rewrite all of their definitions with ACK. They would not be able to use your annotation-based solution - they would not be able to provide the bare minimum required fields to create the custom resource. Instead, they would apply a set of AdoptedResource
objects with the names (as exported from their current tooling) and then be able to download the YAML for future reference.
However, you're offering a different situation, wherein a user already has fully-formed manifests, created elsewhere, that wants to continue reconciliation in a new context. Apart from the edge-cases I identified previously, there isn't anything fundamentally wrong with that.
Similar disaster recovery use case from EBS CSI driver: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1160
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle rotten
/lifecycle frozen
I suggest that adopting existing resources by tag is enough.
At least by default at the moment, there are tags:
-
services.k8s.aws/controller-version
exists with the form%CONTROLLER_SERVICE%-%CONTROLLER_VERSION%
-
services.k8s.aws/namespace
exists with the form%K8S_NAMESPACE%
If you additionally had tags with the kubernetes resource name, e.g.
-
services.k8s.aws/name
existed with the form%K8S_RESOURCE_NAME%
Then you should have all you need to link a resource back to it's recreated kubernetes form.
have you got any update on this issue?