kafka-connect-storage-common
kafka-connect-storage-common copied to clipboard
Extend list of basic partitioner: FieldAndTimeBasedPartitioner.java & HeaderAndTimeBasedPartitioner.java
We use KafkaConnect to dump topics to AWS S3. Analyzing data is pretty simple with Athena + AWS Glue (Crawlers) + AWS S3. It looks like a common way for AWS users.
Problem The base problem happens when we partition by fields from the Kafka message. Athena can not create a table because parts of S3 subpath are separate columns and all Json keys are separate columns too. Two the same column names are impossible.
Solution It's a good idea to add Partitioner based on Header field & Time
Extra There is a good custom Partitioner which also can be used as default in this repo FieldAndTimeBasedPartitioner