kafka-ui icon indicating copy to clipboard operation
kafka-ui copied to clipboard

Failing to connect to MSK from Localhost

Open abezard opened this issue 3 years ago • 10 comments

Hi everyone,

I already apologize as it was probably asked many times, but I couldn't find a working answer to my issue in all the existing closed/opened issues.

I'm simply trying to connect to an AWS-managed Kafka cluster from my localhost, despite using very broad permissions for testing purposes.

My user has the following permissions in AWS:

{
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "*",
            "Resource": "*"
        }
    ]
}

I'm using the following ~/.aws/config file:

[default]
aws_access_key_id = <KEY>
aws_secret_access_key = <ACCESS_KEY>

and the following env.cfg file:

KAFKA_CLUSTERS_0_NAME=kafka-test
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=b-3.<URL>.kafka.us-west-1.amazonaws.com:9098,b-2.<URL>.kafka.us-west-1.amazonaws.com:9098,b-1.<URL>.kafka.us-west-1.amazonaws.com:9098
KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL=SASL_SSL
KAFKA_CLUSTERS_0_PROPERTIES_SASL_MECHANISM=AWS_MSK_IAM
KAFKA_CLUSTERS_0_PROPERTIES_SASL_CLIENT_CALLBACK_HANDLER_CLASS=software.amazon.msk.auth.iam.IAMClientCallbackHandler
KAFKA_CLUSTERS_0_PROPERTIES_SASL_JAAS_CONFIG=software.amazon.msk.auth.iam.IAMLoginModule required awsProfileName="default";
  • The Kafka cluster in AWS is correctly configured
  • I can bind to the cluster on port 9098 from my localhost

But for some reason I'm still getting an Caused by: org.apache.kafka.common.errors.ClusterAuthorizationException: Cluster authorization failed. when launching kafka-ui:

docker run --rm --name kafka-1 \
  --env-file /tmp/env.cfg \
  --volume ~/.aws:/home/kafkaui/.aws \
  --publish 8080:8080 \
  provectuslabs/kafka-ui:latest
  2022-06-18 01:33:33,324 INFO  [background-preinit] o.h.v.i.u.Version: HV000001: Hibernate Validator 6.2.0.Final
2022-06-18 01:33:33,363 INFO  [main] c.p.k.u.KafkaUiApplication: Starting KafkaUiApplication using Java 13.0.9 on 19c7562c6457 with PID 1 (/kafka-ui-api.jar started by kafkaui in /)
2022-06-18 01:33:33,364 DEBUG [main] c.p.k.u.KafkaUiApplication: Running with Spring Boot v2.6.3, Spring v5.3.15
2022-06-18 01:33:33,365 INFO  [main] c.p.k.u.KafkaUiApplication: No active profile set, falling back to default profiles: default
2022-06-18 01:33:36,885 INFO  [main] o.s.d.r.c.RepositoryConfigurationDelegate: Bootstrapping Spring Data LDAP repositories in DEFAULT mode.
2022-06-18 01:33:36,980 INFO  [main] o.s.d.r.c.RepositoryConfigurationDelegate: Finished Spring Data repository scanning in 76 ms. Found 0 LDAP repository interfaces.
2022-06-18 01:33:38,178 INFO  [main] c.p.k.u.s.DeserializationService: Using SimpleRecordSerDe for cluster 'kafka-test'
2022-06-18 01:33:39,438 INFO  [main] o.s.b.a.e.w.EndpointLinksResolver: Exposing 2 endpoint(s) beneath base path '/actuator'
2022-06-18 01:33:39,692 INFO  [main] o.s.b.a.s.r.ReactiveUserDetailsServiceAutoConfiguration: 

Using generated security password: <TMP_PASSWORD>

2022-06-18 01:33:39,893 WARN  [main] c.p.k.u.c.a.DisabledAuthSecurityConfig: Authentication is disabled. Access will be unrestricted.
2022-06-18 01:33:40,240 INFO  [main] o.s.l.c.s.AbstractContextSource: Property 'userDn' not set - anonymous context will be used for read-write operations
2022-06-18 01:33:40,828 INFO  [main] o.s.b.w.e.n.NettyWebServer: Netty started on port 8080
2022-06-18 01:33:40,867 INFO  [main] c.p.k.u.KafkaUiApplication: Started KafkaUiApplication in 8.804 seconds (JVM running for 9.802)
2022-06-18 01:33:40,917 DEBUG [parallel-1] c.p.k.u.s.ClustersMetricsScheduler: Start getting metrics for kafkaCluster: kafka-test
2022-06-18 01:33:40,951 INFO  [parallel-1] o.a.k.c.a.AdminClientConfig: AdminClientConfig values: 
	bootstrap.servers = [b-3.<URL>.kafka.us-west-1.amazonaws.com:9098,b-2.<URL>.kafka.us-west-1.amazonaws.com:9098,b-1.<URL>.kafka.us-west-1.amazonaws.com:9098]
	client.dns.lookup = use_all_dns_ips
	client.id = 
	connections.max.idle.ms = 300000
	default.api.timeout.ms = 60000
	metadata.max.age.ms = 300000
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	receive.buffer.bytes = 65536
	reconnect.backoff.max.ms = 1000
	reconnect.backoff.ms = 50
	request.timeout.ms = 30000
	retries = 2147483647
	retry.backoff.ms = 100
	sasl.client.callback.handler.class = class software.amazon.msk.auth.iam.IAMClientCallbackHandler
	sasl.jaas.config = [hidden]
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.login.callback.handler.class = null
	sasl.login.class = null
	sasl.login.refresh.buffer.seconds = 300
	sasl.login.refresh.min.period.seconds = 60
	sasl.login.refresh.window.factor = 0.8
	sasl.login.refresh.window.jitter = 0.05
	sasl.mechanism = AWS_MSK_IAM
	security.protocol = SASL_SSL
	security.providers = null
	send.buffer.bytes = 131072
	socket.connection.setup.timeout.max.ms = 30000
	socket.connection.setup.timeout.ms = 10000
	ssl.cipher.suites = null
	ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
	ssl.endpoint.identification.algorithm = https
	ssl.engine.factory.class = null
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.certificate.chain = null
	ssl.keystore.key = null
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = JKS
	ssl.protocol = TLSv1.3
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.certificates = null
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = JKS

2022-06-18 01:33:41,488 INFO  [parallel-1] o.a.k.c.s.a.AbstractLogin: Successfully logged in.
2022-06-18 01:33:41,721 INFO  [parallel-1] o.a.k.c.u.AppInfoParser: Kafka version: 2.8.0
2022-06-18 01:33:41,721 INFO  [parallel-1] o.a.k.c.u.AppInfoParser: Kafka commitId: ebb1d6e21cc92130
2022-06-18 01:33:41,721 INFO  [parallel-1] o.a.k.c.u.AppInfoParser: Kafka startTimeMs: 1655516021716
2022-06-18 01:33:46,305 ERROR [parallel-2] c.p.k.u.s.MetricsService: Failed to collect cluster kafka-test info
java.lang.IllegalStateException: Error while creating AdminClient for Cluster kafka-test
	at com.provectus.kafka.ui.service.AdminClientServiceImpl.lambda$createAdminClient$3(AdminClientServiceImpl.java:45)
	at reactor.core.publisher.Mono.lambda$onErrorMap$31(Mono.java:3733)
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)
	at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onError(FluxMapFuseable.java:140)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:192)
	at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:259)
	at reactor.core.publisher.MonoPublishOn$PublishOnSubscriber.run(MonoPublishOn.java:187)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: org.apache.kafka.common.errors.ClusterAuthorizationException: Cluster authorization failed.

Thanks again! Sorry for asking such basic question. I also tried by assuming a role with full access but got similar results.

Doc reference - https://github.com/provectus/kafka-ui/blob/master/documentation/guides/AWS_IAM.md

abezard avatar Jun 18 '22 01:06 abezard

Hello there abezard! 👋

Thank you and congratulations 🎉 for opening your very first issue in this project! 💖

In case you want to claim this issue, please comment down below! We will try to get back to you as soon as we can. 👀

github-actions[bot] avatar Jun 18 '22 01:06 github-actions[bot]

Hey, thanks for reaching out.

No worries, I guess we still don't have MSk permissions documented well enough if we still keep getting questions :)

Please check this FAQ paragraph Also these: one, two.

Hope it helps, let me know how it goes.

Haarolean avatar Jun 24 '22 21:06 Haarolean

@Haarolean Thanks for your answer. But I'm not sure to understand the point of those links? They're all describing IAM permissions which should completely be covered by the existing policy that I have attached to my user (as described in the first message of this issue):

{
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "*",
            "Resource": "*"
        }
    ]
}

abezard avatar Jun 25 '22 01:06 abezard

Hello @abezard As far I see, you would like to access MSK from you local PC, that is why public access is required. According to https://docs.aws.amazon.com/msk/latest/developerguide/port-info.html, for the public access you should use broker port number 9198. It is also important to know that public MSK access must be enabled in advance and it has some requirements. Public endpoints are different from private, this article explain how to find endpoints for example: private - b-2.cluster-name.uniq-id.c5.kafka.eu-central-1.amazonaws.com:9098 public - b-2-public.cluster-name.uniq-id.c5.kafka.eu-central-1.amazonaws.com:9198

I've checked your configuration, all works fine except endpoint settings.

Please let us know if you have any futher questions.

azatsafin avatar Jun 30 '22 12:06 azatsafin

@azatsafin thanks for your answer. I don't think I need to create a public endpoint for my MSK cluster. As mentioned in my initial post, I can bind just fine to the endpoints from my localhost (I use a VPN peered with my VPC):

Without VPN ON:

abezard:~ abezard$ telnet b-1.<CLUSTER>.ttxeg3.c2.kafka.us-west-1.amazonaws.com 9098
Trying 10.10.10.10...
^C

With VPN ON:

abezard:~ abezard$ telnet b-1.<CLUSTER>.ttxeg3.c2.kafka.us-west-1.amazonaws.com 9098
Trying 10.10.10.10...
Connected to b-1.<CLUSTER>.ttxeg3.c2.kafka.us-west-1.amazonaws.com.
Escape character is '^]'.
^CConnection closed by foreign host.

The MSK public endpoints are only used to be able to reach out to the MSK cluster as far as I know, and having network access to those endpoints doesn't seem to be the problem for me here.

abezard avatar Jun 30 '22 17:06 abezard

Also you can see in the logs that I get some of the following entries:

2022-06-18 01:33:41,488 INFO  [parallel-1] o.a.k.c.s.a.AbstractLogin: Successfully logged in.

So it really makes me feel like it's half-working, I just don't understand why it eventually fails. The final error makes it look like a permissions issue: Caused by: org.apache.kafka.common.errors.ClusterAuthorizationException: Cluster authorization failed., but I literally have admin:admin on the whole AWS account so I'm pretty lost right now..

abezard avatar Jun 30 '22 17:06 abezard

Also I'm not incredibly familiar with Java so I don't know if that has an incidence or not, but parallel-1 stream always seems fine, it's always parallel-2 or any even(4/6...etc) stream that fails.

abezard avatar Jun 30 '22 17:06 abezard

Hi @abezard , does IAM Authentication enabled on your Kafka Cluster ? If so, could you please share somehow your cluster configuration and CloudTrails logs which has event source kafka-cluster.amazonaws.com .

azatsafin avatar Jul 01 '22 08:07 azatsafin

This issue has been automatically marked as stale because no requested feedback has been provided. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 09 '22 02:07 github-actions[bot]

I had a similar issue, try the following statement policy. This gives access to all actions and all resources, which should help diagnose the problem, then you can reduce the privileges by specifying actions and resources back in.

{
      "Sid": "kafka",
      "Effect": "Allow",
      "Action": [
        "kafka-cluster:*"
      ],
      "Resource": "*"
 }

rizvnn avatar Jul 28 '22 18:07 rizvnn

Closing due to inactivity.

Haarolean avatar Oct 20 '22 15:10 Haarolean