kafka-connect-storage-common icon indicating copy to clipboard operation
kafka-connect-storage-common copied to clipboard

Extend list of basic partitioner: FieldAndTimeBasedPartitioner.java & HeaderAndTimeBasedPartitioner.java

Open ostetsenko opened this issue 2 years ago • 0 comments

We use KafkaConnect to dump topics to AWS S3. Analyzing data is pretty simple with Athena + AWS Glue (Crawlers) + AWS S3. It looks like a common way for AWS users.

Problem The base problem happens when we partition by fields from the Kafka message. Athena can not create a table because parts of S3 subpath are separate columns and all Json keys are separate columns too. Two the same column names are impossible.

Solution It's a good idea to add Partitioner based on Header field & Time

Extra There is a good custom Partitioner which also can be used as default in this repo FieldAndTimeBasedPartitioner

ostetsenko avatar Jan 19 '23 14:01 ostetsenko