aiokafka icon indicating copy to clipboard operation
aiokafka copied to clipboard

KIP-368: Allow SASL Connections to Periodically Re-Authenticate

Open kprzybyla opened this issue 6 months ago • 0 comments

Changes

Fixes #1080 Fixes #1015

I recently discovered that aiokafka does not implement KIP-368, which is essential for the OAUTHBEARER authorization, for example, when using AWS MSK. This was not so obvious to find out for me as someone who does not know Kafka internals and just wants to use a certain authorization method, because according to the KIP-368, the server will just break the connection to any client for which the token has expired. So, from the client side, this looks like the Kafka server is terminating the connection for no reason. This was especially mindboggling because the Amazon MSK library "hardcodes" the session timeout to be 15 minutes, while the token is actually granted with a 1-hour expiration time, and this did not help to make the 1-hour error interval noticeable.

Also, Amazon MSK library describes in their "Get Started" how to use it with the kafka-python library, but this library does not implement KIP-368 either (see https://github.com/dpkp/kafka-python/issues/2205), which leads me to believe that not that many people is aware how problematic this is when you are trying to built something reliable. This also happens to be the case for the confluent-kafka-python (see https://github.com/confluentinc/confluent-kafka-python/issues/1485), which is also used in the Amazon MSK example.

Anyway, after I figured out what was happening, I created a quick and dirty patch for aiokafka with a workaround that closed connections that were about to expire, and this got rid of all connection drops from the Kafka server that I was experiencing previously. This "patched" version is now used in our production in the company I worked for, and it works flawlessly, but due to certain implementation shortcuts, it generates unharmful but still annoying errors. So, after it was proven that this solved the issue we were battling for a long time, I decided to contribute a polished solution so that nobody else has to go through the same frustration I went through to get this working reliably.

The OAUTHBEARER SASL mechanism was also painful to configure in the Kafka server since it is implemented differently than other SASL mechanisms. Setting up OAUTHBEARER with other SASL mechanisms is impossible, at least I did not find a way to make it work. Because of this, a separate Docker container runs specifically for OAUTHBEARER tests.

Also, for anyone reading this who will use AWS MSK, please remember to set a static value for AWS_ROLE_SESSION_NAME (different for each client), otherwise, the re-authentication will fail since by default, the session name will contain the current timestamp at the end, and the session name must stay the same for a given client.

Java reference implementation of the same feature: https://github.com/apache/kafka/pull/5582

Checklist

  • [X] I think the code is well written
  • [X] Unit tests for the changes exist
  • [X] Documentation reflects the changes
  • [x] Add a new news fragment into the CHANGES folder

kprzybyla avatar May 06 '25 06:05 kprzybyla