s3-connector-for-apache-kafka
s3-connector-for-apache-kafka copied to clipboard
Out-of-Memory errors when sinking large topics
Scenario Overview
We have several topics, each of them already containing gigabytes of data (~1-10 millions of records). We need to export the data to S3.
Issue:
Using the Aiven S3 Connector we run into Out-of-memory errors indicating that the Kafka Connect JVM process does not have enough heap space.
Consequences:
The S3 connector runs into errors. The entire Kafka Connect cluster is lagging. The Aiven CLI stops working and returns an 503 error.
Details:
Looking at the logs it looks like the connector is permanently ingesting messages from the topic and storing them in memory. (log messages come from here)
It looks like the connector is not fast enough in writing to S3 and thus the memory is not freed in time.
We managed to get rid of the Out-of-memory errors by scaling up the Kafka Connect cluster. However, this is not a suitable long-term solution as we would need to setup multiple such connectors in parallel in the future.
We would like to have something that gives us some control over the memory consumption of the connector, e.g., a configuration for the maximum size of the input records buffer.
PS: Trying out the Confluent S3 connector provided by Aiven (version 5.0.0) does not run into Out-of-memory errors and utilizes a lot less memory but it's not an option for us.