kafka-connect-storage-cloud
kafka-connect-storage-cloud copied to clipboard
[feature request] flush instead of crashing in case of memory shortage
Hello,
We archive on s3 the data going through certain topics, we configured a couple of s3 sink connectors to do this.
To have this data usable/exploitable, we try to avoid having it too much fragmented, with numerous small files on s3. So we configured the time partitioning, flush size, rotate interval to have larger files. But now our connect cluster is unstable and fails randomly with OOMs, because of too much data sitting in memory. And now we're fiddling with s3.part.size, following this SO post.
Worse, the situation is a kind of dead-end: after an OOM, if we restart the connector (or the whole connect), the same situation will happen again (data does not fit better in memory), and the cluster won't resume to normal operation. So we're good to reconfigure memory settings or flush size to resume operation.
This is not satisfactory at all, and I'd prefer having a rock-solid connect cluster I can trust, and avoid being too dependent on any hard-limit.
What about flushing automatically the data to s3 when memory becomes scarce ? With a couple of error/warning logs of course. At least the connector wouldn't crash/stop randomly and require an on-call engineer to fiddle with settings to resume the connector.
Or maybe I miss the point, or a config settings. In that case, I'd be happy to learn ;-)
For example, s3.part.size = 5MB
means that task upload at most 5MB buffer bytes to s3 in one time.
Here are some ideas, I don't know if it can help you: Reduce this param will help you reduce the memory malloc for one s3 outputstream. Increase the connect-cluster machine number can make one machine take on fewer tasks ( and fewer s3 outputstream)
No do not reduce s3.part.size !
Use this -> https://github.com/confluentinc/kafka-connect-storage-cloud/pull/320
@raphaelauv well, it sounds promising, but these PRs seems stalled... Any idea when they could land ?
Confluent don't review contributions , so very probably never
No do not reduce s3.part.size !
Use this -> #320
Hi, I wonder why we shouldn't reduce s3.part.size?
@tjgykhulj
is generally considered that a s3 optimal file size for analytic needs is around 200mo
check internet for more details