community icon indicating copy to clipboard operation
community copied to clipboard

Provide usable details on permission errors

Open bra-fsn opened this issue 1 year ago • 6 comments

Is your feature request related to a problem? I'm trying to create an S3 bucket with the S3 controller, but it fails. The controller logs this error:

2023-07-28T09:07:04.538Z	ERROR	Reconciler error	{"controller": "bucket", "controllerGroup": "s3.services.k8s.aws", "controllerKind": "Bucket", "Bucket": {"name":"test-bucket-name","namespace":"cluster-resources"}, "namespace": "cluster-resources", "name": "test-bucket-name", "reconcileID": "788046c2-8a99-4a3d-9619-4ce9d925d128", "error": "AccessDenied: Access Denied\n\tstatus code: 403, request id: F5VN6S14D1C1DMC4, host id: +bhom7Xc0sEoAexP7Xlx5XzLGxKzZusG1Xucs3OMyLXUvXWcJudGa6GK3QlHQq4PlJs6+nXN44I="}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235

Surely, some permissions are missing (the role doesn't have full access, but definitely has access to create/query a bucket), but this gives zero hints about what operation has failed.

Describe the solution you'd like It would be great if there was more context about what has failed exactly, so the relevant policy could be updated to allow that operation.

Describe alternatives you've considered I've tried to find the failing operation from the backtrace (doesn't seem to be usable at least for me) and finding it in Cloudtrail without success.

bra-fsn avatar Jul 28 '23 09:07 bra-fsn

Hello!

What permission did you assign to s3 controller? I am also asking because I wonder how to narrow down it's permissions only to necessary ones.

gecube avatar Jul 28 '23 10:07 gecube

What permission did you assign to s3 controller? I am also asking because I wonder how to narrow down it's permissions only to necessary ones.

I use a combination of tag and name-based permissions, implemented as a boundary policy. The controller itself has arn:aws:iam::aws:policy/AdministratorAccess, but the boundary policy limits that to s3:* access for arn:aws:s3:::${var.short_prefix}-* resources (I'm trying to create a new bucket which matches that) along with others and extends the limits with:

actions   = ["*"]
    resources = ["*"]
    condition {
      test     = "ForAnyValue:StringEqualsIgnoreCase"
      variable = "aws:ResourceTag/Owner"
      values   = var.owner_tags
    }

but that should be irrelevant here.

So I guess it either wants to do a different operation (other than s3, but that might be unlikely if it works with the recommended policy, however it has s3-object-lambda:*, which I don't have), or something outside of the limited resource name.

bra-fsn avatar Jul 28 '23 11:07 bra-fsn

BTW, turning on debug doesn't really help either:

2023-07-28T12:01:48.376Z	DEBUG	ackrt	> r.Sync	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "generation": 2}
2023-07-28T12:01:48.376Z	DEBUG	ackrt	>> r.resetConditions	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "generation": 2}
2023-07-28T12:01:48.376Z	DEBUG	ackrt	<< r.resetConditions	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "generation": 2}
2023-07-28T12:01:48.376Z	DEBUG	ackrt	>> rm.ResolveReferences	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "is_adopted": false, "generation": 2}
2023-07-28T12:01:48.376Z	DEBUG	ackrt	<< rm.ResolveReferences	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "is_adopted": false, "generation": 2}
2023-07-28T12:01:48.376Z	DEBUG	ackrt	>> rm.EnsureTags	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "is_adopted": false, "generation": 2}
2023-07-28T12:01:48.376Z	DEBUG	ackrt	<< rm.EnsureTags	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "is_adopted": false, "generation": 2}
2023-07-28T12:01:48.376Z	DEBUG	ackrt	>> rm.ReadOne	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "is_adopted": false, "generation": 2}
2023-07-28T12:01:48.376Z	DEBUG	ackrt	>>> rm.sdkFind	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "is_adopted": false, "generation": 2}
2023-07-28T12:01:48.653Z	DEBUG	ackrt	<<< rm.sdkFind	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "is_adopted": false, "generation": 2, "error": "AccessDenied: Access Denied\n\tstatus code: 403, request id: TWDERKK1Z4VMQ2GT, host id: HmtlvkwX3I+iWlAWL6ock+324vpq+vxubvDecubj8djeXzo3smJ7vTVXknTXJHbz4Zk0C7QIysE="}
2023-07-28T12:01:48.653Z	DEBUG	ackrt	<< rm.ReadOne	{"account": "ACCOUNT", "role": "", "region": "us-east-1", "kind": "Bucket", "namespace": "cluster-resources", "name": "test-bucket-name", "is_adopted": false, "generation": 2, "error": "AccessDenied: Access Denied\n\tstatus code: 403, request id: TWDERKK1Z4VMQ2GT, host id: HmtlvkwX3I+iWlAWL6ock+324vpq+vxubvDecubj8djeXzo3smJ7vTVXknTXJHbz4Zk0C7QIysE="}

apart from this seems to be a failure in the discovery phase.

Looking at the code in manager.go and sdk.go I can now see what the problem is: the missing s3:ListAllMyBuckets permission (which the boundary policy lacked, because it doesn't have a resource parameter).

So my problem is solved, but the issue remains: instead of switching the controller into debug mode and having to read the actual code, it would be nicer if the normal level logs could contain the exact operation which fails.

@gecube, you should be fine with something like this for the narrowed down policy (or you could limit s3:* even further, the required API calls should be listed if you do a grep in the source for RecordAPICall):

statement {
    sid = "Wildcard"

    actions   = [
      "s3:ListAllMyBuckets",  # s3 ACK
    ]
    resources = ["*"]
}
statement {
    sid = "S3"
    actions = [
      "s3:*",
    ]
    resources = [
      "arn:aws:s3:::list_of_allowed_s3_buckets",
    ]
}

bra-fsn avatar Jul 28 '23 12:07 bra-fsn

Checking back in here. You were able to resolve this by adding the extra permission?

RedbackThomson avatar Aug 15 '23 18:08 RedbackThomson

Yes, it works. Although the issue is about the error message, which isn't really helpful.

bra-fsn avatar Aug 15 '23 20:08 bra-fsn

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Feb 12 '24 01:02 ack-bot

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot avatar Aug 10 '24 20:08 ack-bot

/remove-lifecycle stale

gecube avatar Aug 11 '24 15:08 gecube