streamx
streamx copied to clipboard
saving json data , partition by specific field (timestamp)
I have a question, data in kafka is in json format, in each event I have a field called"eventTimestamp" which is a long number which represents the event time , I want to save the data in s3 in hourly bucket based on that timestamp, not the time the event was added to Kafka
my settings when I used Kafka s3 connect are :
connector.class=io.confluent.connect.s3.S3SinkConnector storage.class=io.confluent.connect.s3.storage.S3Storage format.class=io.confluent.connect.s3.format.json.JsonFormat schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner timestamp.extractor=RecordField path.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH timestamp.field=eventTimestamp partition.duration.ms=10 locale=en_IN timezone=UTC
I see that streamx support TimeBasedPartitioner but if I understand it can only support to extract RecordField from parquet or avro not from json
Is it possible to do it with json ?