MSK: operation error Kafka: CreateCluster, https response error StatusCode: 403
Describe the bug
arn:aws:iam::aws:policy/AmazonMSKFullAccess attached with Pod Identity results in:
{
"level": "error",
"ts": "2025-05-09T05:31:01.021Z",
"msg": "Reconciler error",
"controller": "cluster",
"controllerGroup": "kafka.services.k8s.aws",
"controllerKind": "Cluster",
"Cluster": {
"name": "cluster-name",
"namespace": "ack-system"
},
"namespace": "ack-system",
"name": "x",
"reconcileID": "7680a7be-2523-4689-9268-0c04a18db412",
"error": "operation error Kafka: CreateCluster, https response error StatusCode: 403, RequestID: 3bba50f8-f56f-4d73-a50f-23eef5249e01, api error AccessDeniedException: User: xxx is not authorized to perform: kafka:CreateCluster on resource: *",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255"
}
Steps to reproduce
Expected outcome Create cluster
Environment
- Kubernetes version 1.31
- Using EKS - yes
- AWS service - MSK
Hello @kappa8219 👋 Thank you for opening an issue in ACK! A maintainer will triage this issue soon.
We encourage community contributions, so if you're interested in tackling this yourself or suggesting a solution, please check out our Contribution and Code of Conduct guidelines.
You can find more information about ACK on our website.
Same as for closed Issue #2074
Hi! Please kindly check with full admin perms.
Hi! Please kindly check with full admin perms.
It is strange but still the same with policy arn:aws:iam::aws:policy/AdministratorAccess
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
{
"level": "error",
"ts": "2025-05-13T07:42:48.221Z",
"msg": "Reconciler error",
"controller": "cluster",
"controllerGroup": "kafka.services.k8s.aws",
"controllerKind": "Cluster",
"Cluster": {
"name": "xxxx",
"namespace": "ack-system"
},
"namespace": "ack-system",
"name": "xxxxx",
"reconcileID": "337fb40f-400e-4d38-92c7-e7e35b418320",
"error": "operation error Kafka: CreateCluster, https response error StatusCode: 403, RequestID: 3a37877e-b0d6-428f-8d9e-63b5e1d42f17, api error AccessDeniedException: User: arn:aws:sts::xxxx:assumed-role/ack-controllers-kafka/eks-terra-clus-ack-contro-9cd673e7-892d-4e7d-8dac-ee1119aa67b0 is not authorized to perform: kafka:CreateCluster on resource: *",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255"
}
@kappa8219 did you restart controller after changing the perms? Very strange as it worked for me two weeks ago.
@gecube Sure, restarted. I have pod identify as role appliance mechanism. But all other controllers assume same way, all fine.
ah... I used IRSA... I still did not switch to pod identity.
Any updates on this investigation? I'm facing similar issues on my stack too
@vinicius91 I obsoleted MSK and switched to Strimzi (as we have EKS). I recommend to consider this option to everybody.
But if we are talking about MSK and it is mandatory requirement to use it - unfortunately, I did not progress with investigation.
Strimzi was our first candidate to provision Kafka since we are also on EKS, but we decided to give MSK a try assuming that it would be a smoother experience since we already use other ack controllers, but so far it has been the opposite.
Would you say that Strimzi is on the same complexity level as the Provisioned Standard when it comes to storage management?
@vinicius91 I think that strimzi gives better experience.. at leas from the maintenance and cost perspective. But it could sound like anti-advertisement or advertisement against managed amazon services and msk in particular - no way :-)
Yes, but it would nice if they put more effort into the product. This unaddressed issue here since 2024 is a bad advertisement in itself :(
Hello @kappa8219 @vinicius91
I was unable to replicate this issue. I'm also using PodIdentity with AmazonMSKFullAccess permission, and was able to create the msk Cluster successfully..
I'm using v1.2.1.
Not sure what could be the issue here..
Hello @kappa8219 @vinicius91 I was unable to replicate this issue. I'm also using PodIdentity with
AmazonMSKFullAccesspermission, and was able to create the msk Cluster successfully.. I'm using v1.2.1.Not sure what could be the issue here..
Hm, will also retry with the modern version, also 4.1 is out, it is interesting to see queues in Kafka.
Still no success :(
{
"level": "error",
"ts": "2025-10-24T05:19:33.504Z",
"msg": "Reconciler error",
"controller": "cluster",
"controllerGroup": "kafka.services.k8s.aws",
"controllerKind": "Cluster",
"Cluster": {
"name": "xxx-dev-eks-app",
"namespace": "ack-system"
},
"namespace": "ack-system",
"name": "xxx-dev-eks-app",
"reconcileID": "8d29b64d-3355-4cb5-9fe0-ad72ec16cfd3",
"error": "operation error Kafka: CreateCluster, https response error StatusCode: 403, RequestID: ID, api error AccessDeniedException: User: arn:aws:sts::xxx:assumed-role/ack-controllers-kafka/eks-terra-clus-ack-contro-xxx is not authorized to perform: kafka:CreateCluster on resource: *",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255"
}
Pod Identity association seems to be fine(at least for many other controllers such config works). Role is also looks attached with both AdministratorAccess and AmazonMSKFullAccess.
Here is my cluster config:
apiVersion: kafka.services.k8s.aws/v1alpha1
kind: Cluster
metadata:
name: dev-eks-app
namespace: ack-system
spec:
name: dev-eks-app
kafkaVersion: "4.1.x.kraft"
numberOfBrokerNodes: 2
brokerNodeGroupInfo:
instanceType: "kafka.m7g.large"
clientSubnets:
- subnet-xxx
- subnet-yyy
securityGroups:
- sg-zzz
storageInfo:
ebsStorageInfo:
volumeSize: 200
provisionedThroughput:
enabled: false
encryptionInfo:
encryptionInTransit:
clientBroker: "TLS_PLAINTEXT"
inCluster: true
enhancedMonitoring: "DEFAULT"
loggingInfo:
brokerLogs:
cloudWatchLogs:
enabled: true
logGroup: dev-mks-logs-app
configurationInfo:
arn: "aws_msk_configuration.dev-cluster-configuration-eks-2node-kraft.arn"
revision: 1
openMonitoring:
prometheus:
jmxExporter:
enabledInBroker: true
nodeExporter:
enabledInBroker: true
---
apiVersion: kafka.services.k8s.aws/v1alpha1
kind: Configuration
metadata:
name: dev-cluster-configuration-eks-2node-kraft2
namespace: ack-system
spec:
name: "dev-cluster-configuration-eks-2node-kraft2"
kafkaVersions:
- "4.1.x.kraft"
serverProperties: MY_HASH_CONFIG
---
Controller: public.ecr.aws/aws-controllers-k8s/kafka-controller:1.2.1
Values for eks, kafka controller helm chart which is creating pod identity assoc:
eks:
enabled: true
aws:
region: us-east-1
deployment:
tolerations:
- effect: NoSchedule
key: ng
operator: Equal
value: ccc
nodeSelector:
NodeType: ccc
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::xxx:role/ack-eks-controller
name: ack-eks-controller
kafka:
enabled: true
aws:
region: us-east-1
deployment:
tolerations:
- effect: NoSchedule
key: ng
operator: Equal
value: ctrls
nodeSelector:
NodeType: ctrls
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::xxx:role/ack-controllers-kafka
name: ack-controllers-kafka
apiVersion: eks.services.k8s.aws/v1alpha1
kind: PodIdentityAssociation
metadata:
name: pod-identity-association-controllers-kafka
namespace: ack-system
spec:
clusterName: mycluster
namespace: ack-system
roleARN: arn:aws:iam::xxx:role/ack-controllers-kafka
serviceAccount: ack-controllers-kafka
Finally the role:
apiVersion: iam.services.k8s.aws/v1alpha1
kind: Role
metadata:
name: ack-controllers-kafka
namespace: ack-system
spec:
name: ack-controllers-kafka
assumeRolePolicyDocument: |
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowEksAuthToAssumeRoleForPodIdentity",
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": [
"sts:AssumeRole",
"sts:TagSession"
]
}
]
}
policies:
- arn:aws:iam::aws:policy/AdministratorAccess
- arn:aws:iam::aws:policy/AmazonMSKFullAccess
Update: I tried on EKS 1.34 - same thing. What confuses me is configuration creation works, so mechanism of role assume definetely fine. But what causes fail of CreateCluster is not clear.
Works:
apiVersion: kafka.services.k8s.aws/v1alpha1
kind: Configuration
Does not:
apiVersion: kafka.services.k8s.aws/v1alpha1
kind: Cluster
@michaelhtm, @vinicius91 any ideas wht to try? Maybe add some debug to the controller?
@kappa8219 can you try assuming the PodIdentity role in your terminal and try creating the cluster using aws cli? I just saw this https://repost.aws/questions/QUszJm_J6pR32y7qdpO9oAng/is-not-authorized-to-perform-kafka-createcluster where someone is running into the same issue when using the cli..
@michaelhtm Interesting case in repost, but not mine.
I can create cluster both with aws web console and cli:
aws kafka create-cluster --cluster-name test-msk2 --kafka-version "4.1.x.kraft" --number-of-broker-nodes 2 --broker-node-group-info '{
"InstanceType": "kafka.m5.large",
"ClientSubnets": ["subnet-x", "subnet-y"]
}'
{
"ClusterArn": "arn:aws:kafka:us-east-1:xxx:cluster/test-msk2/xxx",
"ClusterName": "test-msk2",
"State": "CREATING"
}
Still from ACK controller - fail.
One more thing I remember that pod identity works only when using aws v2 connect library. But this controller based on quite resent ACK runtime so libraries should be up to date.
@michaelhtm, @vinicius91 I finally discovered what was wrong. The:
configurationInfo:
arn: "FULL_CONFIG_ARN"
I got here not correct value, sorry, my bad. One thing for my excuse is the error message, it is not correct for the case. Maybe linking Configuration by name, not by ARN would help avoid such cases.
Thanks for the update folks, I'll give it a try on my end