streamx saving json data , partition by specific field (timestamp)

saving json data , partition by specific field (timestamp)

Open doriwaldman opened this issue 7 years ago • 0 comments

I have a question, data in kafka is in json format, in each event I have a field called"eventTimestamp" which is a long number which represents the event time , I want to save the data in s3 in hourly bucket based on that timestamp, not the time the event was added to Kafka

my settings when I used Kafka s3 connect are :

connector.class=io.confluent.connect.s3.S3SinkConnector storage.class=io.confluent.connect.s3.storage.S3Storage format.class=io.confluent.connect.s3.format.json.JsonFormat schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner timestamp.extractor=RecordField path.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH timestamp.field=eventTimestamp partition.duration.ms=10 locale=en_IN timezone=UTC

I see that streamx support TimeBasedPartitioner but if I understand it can only support to extract RecordField from parquet or avro not from json

Is it possible to do it with json ?

Jan 22 '18 14:01 doriwaldman

streamx streamx copied to clipboard

saving json data , partition by specific field (timestamp)

streamx
streamx copied to clipboard