community icon indicating copy to clipboard operation
community copied to clipboard

iam-controller is not able to delete created IAM role

Open sylin218 opened this issue 2 years ago • 19 comments

Describe the bug The IAM role isn't deleted after removing the role manifest for iam-controller 1.2.3. Below is the error message: Message: DeleteConflict: Cannot delete entity, must delete policies first. status code: 409, request id: ebc1f8a1-68ed-4fbc-baed-572c3de80960

Steps to reproduce

  1. create iam role manifest with inlinePolicies
  2. create iam role from step 1
  3. delete the role manifest

Expected outcome A concise description of what you expected to happen. The iam role should be deleted after removing the role manifest

Environment

  • Kubernetes version 1.24
  • Using EKS (yes/no), if so version? eks.10
  • AWS service targeted (S3, RDS, etc.)

sylin218 avatar Sep 28 '23 22:09 sylin218

Hi @sylin218 can you share an example CR of the resource you're trying to create/delete?

a-hilaly avatar Sep 28 '23 23:09 a-hilaly

Hi @sylin218 !

Please give more particular example, as I checked multiple times and did not have complaints regarding this function...

gecube avatar Oct 04 '23 11:10 gecube

iam-role.txt

Hey since the github doesn't support .yaml, I change the manifest extension to txt. Also I hide some sensitive info.

sylin218 avatar Oct 05 '23 03:10 sylin218

Hey @gecube @a-hilaly do we have any updates here?

sylin218 avatar Oct 20 '23 22:10 sylin218

@sylin218 Hi! Please don't be confused :-) I am not developer of ACK, but a little bit passionated guy :-) I will be able to reproduce your issue (at least try to do it) in next few days.

gecube avatar Oct 21 '23 05:10 gecube

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Apr 18 '24 07:04 ack-bot

/remove-lifecycle stale

gecube avatar Apr 18 '24 07:04 gecube

@sylin218 @gecube can either of you confirm that indeed this is still a bug with the latest version of the iam-controller?

a-hilaly avatar Apr 18 '24 12:04 a-hilaly

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Oct 15 '24 17:10 ack-bot

/remove-lifecycle stale

gecube avatar Oct 16 '24 07:10 gecube

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Apr 14 '25 07:04 ack-bot

/remove-lifecycle stale

gecube avatar Apr 24 '25 19:04 gecube

@sylin218 @gecube can either of you confirm that indeed this is still a bug with the latest version of the iam-controller?

@gecube is this still a bug?

michaelhtm avatar Apr 24 '25 19:04 michaelhtm

@michaelhtm Hi! Thanks for the reminder.

So I am checking, what's going on.

First observation:

there is one role attached as EC2 instance profile. And as EC2 instances still exist I can not remove the role:

{"level":"error","ts":"2025-04-25T06:45:00.738Z","msg":"Reconciler error","controller":"role","controllerGroup":"iam.services.k8s.aws","controllerKind":"Role","Role":{"name":"ec2-ledger","namespace":"infra-production"},"namespace":"infra-production","name":"ec2-ledger","reconcileID":"3b40209c-9b62-42a1-8348-aacbaee65044","error":"operation error IAM: DeleteRole, https response error StatusCode: 409, RequestID: eef2fdca-b01b-4568-a7c5-526396d1774b, DeleteConflict: Cannot delete entity, must remove roles from instance profile first.","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255"}

it looks like limitation of Amazon API, so we could do nothing good with it, the only option would be to emit a proper metric with an alert.

And yes, I see the proper condition in the status field of Kind: Role

...
    - message: 'DeleteConflict: Cannot delete entity, must remove roles from instance profile first.'
      status: 'True'
      type: ACK.Recoverable
...

Second observation.

It looks like that I have an incorrect role referring to an inexistent policy in dev environment:

{"level":"error","ts":"2025-04-25T06:47:36.802Z","msg":"Reconciler error","controller":"role","controllerGroup":"iam.services.k8s.aws","controllerKind":"Role","Role":{"name":"teleport-role","namespace":"infra-dev"},"namespace":"infra-dev","name":"teleport-role","reconcileID":"292760ce-e56a-40c0-afb8-9751f8cc89f2","error":"operation error IAM: CreateRole, https response error StatusCode: 404, RequestID: 0a850a3d-31c8-401f-a9b0-a4e1485cd306, api error NoSuchEntity: Scope ARN: arn:aws:iam::474417630776:policy/DatabaseDiscoveryBoundary does not exist or is not attachable.","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255"}

Again - the condition is set properly:

...
  conditions:
    - message: 'api error NoSuchEntity: Scope ARN: arn:aws:iam::474417630776:policy/DatabaseDiscoveryBoundary does not exist or is not attachable.'
      status: 'True'
      type: ACK.Recoverable
    - lastTransitionTime: '2025-04-25T06:48:59Z'
      message: Unable to determine if desired resource state matches latest observed state
      reason: 'operation error IAM: CreateRole, https response error StatusCode: 404, RequestID: 9bafa841-084c-4f49-ad0f-2c13ec984b55, api error NoSuchEntity: Scope ARN: arn:aws:iam::474417630776:policy/DatabaseDiscoveryBoundary does not exist or is not attachable.'
      status: Unknown
      type: ACK.ResourceSynced
...

It is definitely my error as when copying manifests between catalogues, I forgot to change the amazon account ID. Probably we should find some better way to refer the objects in the same account than to rewrite JSON from a scratch. I would be grateful if we could find this approach. For now I decided just to kill these role and associated resources.

Third observation.

Regarding the original topic. I removed around ~20 different roles in different accounts and I could confirm that now it looks like that the original issue was resolved.

gecube avatar Apr 25 '25 07:04 gecube

Hey @gecube , from the response i can gather that the deletion of roles is working as expected.

For the second observation where you mention that you had to rewrite JSON from scratch and were looking for a better approach. Would you mind sharing your manifest? The Prolicy can be defined as resources and then it can be referenced. That should eliminate having to rewrite it.

rushmash91 avatar May 07 '25 21:05 rushmash91

@rushmash91 Hi! Thanks for your comment. I think that when you define Policy like a different Kind: first of all, it does not solve issue that I want to feed the policy like a direct json file from k8s repo secondly, it creates unnecessary objects in Amazon Cloud, which I need to manage and care about.

What really would be cool - if I could put policy like a file into secret or configmap and reference it like

...
   policyRef:
   - kind: ConfigMap
     name: my-great-policy
     key: policy.json
...

because it is very easy to deploy a plain config files with kustomize to k8s. Or otherwise I would need to write a helm chart and wrap an IAM Role

gecube avatar May 12 '25 07:05 gecube

Anyway I don't like a separate Policy because of delays between cycles inside the IAM controller and all other controllers. I observed the next behaviour : if you create an EKS cluster with EKS controller and some of the resources referenced are not ready, the cluster is not created until all prerequisite resources are ready. And it really could take a loooooong time. During which the controller constantly spams a dozen of nonsense messages.

gecube avatar May 12 '25 07:05 gecube

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Nov 08 '25 10:11 ack-bot

/remove-lifecycle stale

gecube avatar Nov 08 '25 17:11 gecube