cloudformation-operator icon indicating copy to clipboard operation
cloudformation-operator copied to clipboard

Report CloudFormation events back to Stack resource

Open linki opened this issue 5 years ago • 10 comments

Events from AWS especially when CF stack creation failed should be surfaced to the Stack CRD via events or something similar.

linki avatar Oct 08 '19 09:10 linki

Hi @linki, We're trying to use the operator and helm charts to deploy our CF stacks (k8s workers are EC2 instances) and we observed that even if the CF creation failed in AWS, helm chart deploy still shows "Deployed" (I can see the error msgs in the Pod logs though). I'm wondering if there's anyway to sync the deploy status of the CRD and AWS, namely CRD waits until AWS CF creation is completed to claim a successful deploy. Is it sth related to this issue? We'd like to contribute to this project as well but think better discuss with you about the options first. Cheers,

luoyimu1 avatar Jun 10 '20 00:06 luoyimu1

@luoyimu1 Yes, that's related to this issue.

Ideally, in the "status" section of the Stack resource we would have information about the real state of the CloudFormation Stack, including whether it succeeded or failed.

Furthermore, it probably makes sense to propagate (some) events that CloudFormation already gives you to the Stack resource in Kubernetes.

linki avatar Jun 11 '20 16:06 linki

You could also head over to AWS' similar project aws-service-operator-k8s. It's being worked on by people from AWS. It's not released yet but the mvp branch seems active.

linki avatar Jun 11 '20 16:06 linki

Thanks @linki, I though the error management is handled at the Helm Chart deployment stage, but seems like I was wrong..Helm Chart deploy would always be successful no matter whether the CF can be provisioned in AWS or not... I've also looked at the MySQL operator as it requires lots of status updates for DB create/backup/restore and they seem to use both k8s Events and status. Will dig further into this issue and see if we could implement a similar solution to CF provisioning. Cheers,

luoyimu1 avatar Jun 15 '20 02:06 luoyimu1

@luoyimu1 Yes, that's related to this issue.

Ideally, in the "status" section of the Stack resource we would have information about the real state of the CloudFormation Stack, including whether it succeeded or failed.

Furthermore, it probably makes sense to propagate (some) events that CloudFormation already gives you to the Stack resource in Kubernetes.

This issue may no longer be there. With the most recent merge, the resources and their current state (including in-flight statuses like CREATING, DELETING) are reflected. You will get the status and the description. So CREATE_FAILED or DELETE_FAILED resources will be there along with the text you'd get in CloudFormation.

cuppett avatar Mar 22 '21 20:03 cuppett

@luoyimu1 Yes, that's related to this issue. Ideally, in the "status" section of the Stack resource we would have information about the real state of the CloudFormation Stack, including whether it succeeded or failed. Furthermore, it probably makes sense to propagate (some) events that CloudFormation already gives you to the Stack resource in Kubernetes.

This issue may no longer be there. With the most recent merge, the resources and their current state (including in-flight statuses like CREATING, DELETING) are reflected. You will get the status and the description. So CREATE_FAILED or DELETE_FAILED resources will be there along with the text you'd get in CloudFormation.

Example:


status:
  createdTime: '2021-02-20T14:24:40Z'
  outputs:
    BucketName: my-bucket-s3bucket-yk25eg3bpemb
  resources:
    - logicalID: S3Bucket
      physicalID: my-bucket-s3bucket-yk25eg3bpemb
      status: DELETE_FAILED
      statusReason: >-
        The bucket you tried to delete is not empty. You must delete all
        versions in the bucket. (Service: Amazon S3; Status Code: 409; Error
        Code: BucketNotEmpty; Request ID: K6G45QRMK566VXZ8; S3 Extended Request
        ID:
        dF448D4fLMqBSTKykRa3NK1ToB8HpdJD0CsHDTp7Q0/Zmb2xD7HK8GjrLK7jyi9oCgzan+p1W+k=;
        Proxy: null)
      type: 'AWS::S3::Bucket'
  stackID: >-
    arn:aws:cloudformation:us-east-2:641875867446:stack/my-bucket/5e15ac70-7387-11eb-bc5a-062eed804cba
  stackStatus: DELETE_FAILED
  updatedTime: null

cuppett avatar Mar 22 '21 20:03 cuppett

@cuppett That looks great. I would take the status section out of scope of this issue.

I don't remember if your PR already publishes some events which can be handy as well.

linki avatar Mar 24 '21 10:03 linki

The merge we did already makes Status output which looks just like that per resource. :)

On Wed, Mar 24, 2021, 6:17 AM Martin Linkhorst @.***> wrote:

@cuppett https://github.com/cuppett That looks great. I would take the status section out of scope of this issue.

I don't remember if your PR already publishes some events which can be handy as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/linki/cloudformation-operator/issues/68#issuecomment-805681045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB2SICWBPSHH5EGH43WG3DTFG343ANCNFSM4I6PMTOQ .

cuppett avatar Mar 24 '21 10:03 cuppett

Yes, I saw that and it's awesome :slightly_smiling_face:

I'll leave this issue open for now because it's about publishing events as well.

linki avatar Mar 24 '21 10:03 linki

Yes, I saw that and it's awesome

I'll leave this issue open for now because it's about publishing events as well.

CloudFormation has two features we don't expose in this operator. An SNS topic (for receiving events) and an IAM role to assume/assign on a per-stack basis. We could expose those? I almost added them. Would the SNS topic solve the need here? (One Stack could define topic/subscription as part of a deployment for some pod that listens and then any other stack created could publish events to it and be consumed.)

We could create a special topic and subscribe in the operator, but I'm worried about orphaning those on uninstall/delete of cluster & capturing the potentially very long event stream in the CRD/Status section making the object unwieldy for API/etcd to store and mule around.

cuppett avatar Mar 24 '21 11:03 cuppett