cloudformation-operator
cloudformation-operator copied to clipboard
Report CloudFormation events back to Stack resource
Events from AWS especially when CF stack creation failed should be surfaced to the Stack CRD via events or something similar.
Hi @linki, We're trying to use the operator and helm charts to deploy our CF stacks (k8s workers are EC2 instances) and we observed that even if the CF creation failed in AWS, helm chart deploy still shows "Deployed" (I can see the error msgs in the Pod logs though). I'm wondering if there's anyway to sync the deploy status of the CRD and AWS, namely CRD waits until AWS CF creation is completed to claim a successful deploy. Is it sth related to this issue? We'd like to contribute to this project as well but think better discuss with you about the options first. Cheers,
@luoyimu1 Yes, that's related to this issue.
Ideally, in the "status" section of the Stack resource we would have information about the real state of the CloudFormation Stack, including whether it succeeded or failed.
Furthermore, it probably makes sense to propagate (some) events that CloudFormation already gives you to the Stack resource in Kubernetes.
You could also head over to AWS' similar project aws-service-operator-k8s. It's being worked on by people from AWS. It's not released yet but the mvp branch seems active.
Thanks @linki, I though the error management is handled at the Helm Chart deployment stage, but seems like I was wrong..Helm Chart deploy would always be successful no matter whether the CF can be provisioned in AWS or not... I've also looked at the MySQL operator as it requires lots of status updates for DB create/backup/restore and they seem to use both k8s Events and status. Will dig further into this issue and see if we could implement a similar solution to CF provisioning. Cheers,
@luoyimu1 Yes, that's related to this issue.
Ideally, in the "status" section of the Stack resource we would have information about the real state of the CloudFormation Stack, including whether it succeeded or failed.
Furthermore, it probably makes sense to propagate (some) events that CloudFormation already gives you to the Stack resource in Kubernetes.
This issue may no longer be there. With the most recent merge, the resources and their current state (including in-flight statuses like CREATING, DELETING) are reflected. You will get the status and the description. So CREATE_FAILED or DELETE_FAILED resources will be there along with the text you'd get in CloudFormation.
@luoyimu1 Yes, that's related to this issue. Ideally, in the "status" section of the Stack resource we would have information about the real state of the CloudFormation Stack, including whether it succeeded or failed. Furthermore, it probably makes sense to propagate (some) events that CloudFormation already gives you to the Stack resource in Kubernetes.
This issue may no longer be there. With the most recent merge, the resources and their current state (including in-flight statuses like CREATING, DELETING) are reflected. You will get the status and the description. So CREATE_FAILED or DELETE_FAILED resources will be there along with the text you'd get in CloudFormation.
Example:
status:
createdTime: '2021-02-20T14:24:40Z'
outputs:
BucketName: my-bucket-s3bucket-yk25eg3bpemb
resources:
- logicalID: S3Bucket
physicalID: my-bucket-s3bucket-yk25eg3bpemb
status: DELETE_FAILED
statusReason: >-
The bucket you tried to delete is not empty. You must delete all
versions in the bucket. (Service: Amazon S3; Status Code: 409; Error
Code: BucketNotEmpty; Request ID: K6G45QRMK566VXZ8; S3 Extended Request
ID:
dF448D4fLMqBSTKykRa3NK1ToB8HpdJD0CsHDTp7Q0/Zmb2xD7HK8GjrLK7jyi9oCgzan+p1W+k=;
Proxy: null)
type: 'AWS::S3::Bucket'
stackID: >-
arn:aws:cloudformation:us-east-2:641875867446:stack/my-bucket/5e15ac70-7387-11eb-bc5a-062eed804cba
stackStatus: DELETE_FAILED
updatedTime: null
@cuppett That looks great. I would take the status section out of scope of this issue.
I don't remember if your PR already publishes some events which can be handy as well.
The merge we did already makes Status output which looks just like that per resource. :)
On Wed, Mar 24, 2021, 6:17 AM Martin Linkhorst @.***> wrote:
@cuppett https://github.com/cuppett That looks great. I would take the status section out of scope of this issue.
I don't remember if your PR already publishes some events which can be handy as well.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/linki/cloudformation-operator/issues/68#issuecomment-805681045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB2SICWBPSHH5EGH43WG3DTFG343ANCNFSM4I6PMTOQ .
Yes, I saw that and it's awesome :slightly_smiling_face:
I'll leave this issue open for now because it's about publishing events as well.
Yes, I saw that and it's awesome
I'll leave this issue open for now because it's about publishing events as well.
CloudFormation has two features we don't expose in this operator. An SNS topic (for receiving events) and an IAM role to assume/assign on a per-stack basis. We could expose those? I almost added them. Would the SNS topic solve the need here? (One Stack could define topic/subscription as part of a deployment for some pod that listens and then any other stack created could publish events to it and be consumed.)
We could create a special topic and subscribe in the operator, but I'm worried about orphaning those on uninstall/delete of cluster & capturing the potentially very long event stream in the CRD/Status section making the object unwieldy for API/etcd to store and mule around.