s3-connector-for-apache-kafka icon indicating copy to clipboard operation
s3-connector-for-apache-kafka copied to clipboard

Extend timestamp variable parameters in File name format

Open BulgakovKD opened this issue 11 months ago • 1 comments

Scenario Overview

We use s3 filename template prefix/'%Y_%m_%d__%H_%M_%S_%f to sort filenames alphabetically. The next new file is guaranteed to receive the following name in alphabetical order.

In kafka, we have several partitions of one topic, each of them must be written with the same prefix (prefix=topic_name) in order. It's possible to ensure the files order with this template by running no more than 1 connector task.

Issue:

Timestamp variable have next parameters:

unit parameter values:
yyyy - year, e.g. 2020 (please note that YYYY is deprecated and is interpreted as yyyy)
MM - month, e.g. 03
dd - day, e.g. 01
HH - hour, e.g. 24

Consequences:

With these parameters, files recorded within 1 hour will not differ in name. Adding the partition number and offset to the file name in the template can solve this problem, but it makes working with the root prefix more difficult. Uniqueness can be ensured by adding minutes, seconds, milliseconds to the timestamp variable.

Details:

Looks like it's enough to extend the following functionality :

    private static final Map<String, DateTimeFormatter> TIMESTAMP_FORMATTERS =
            Map.of(
                    "yyyy", DateTimeFormatter.ofPattern("yyyy"),
                    "MM", DateTimeFormatter.ofPattern("MM"),
                    "dd", DateTimeFormatter.ofPattern("dd"),
                    "HH", DateTimeFormatter.ofPattern("HH")
            );

with next parameters:

"%M" - Minutes in two-digit format.
"%S" - Seconds in two-digit format.
"%f" - Microseconds.

BulgakovKD avatar Feb 29 '24 14:02 BulgakovKD