kafka-connect-storage-cloud
kafka-connect-storage-cloud copied to clipboard
Wrong Last Modified Time of S3 Object
Hi there,
Version
S3 Sink Connector: 10.5.0 Kafka: 3.5.0
Problem
I uploaded data from Kafka to S3 every 30 minutes. However, when the amount of data increases, the last modified time is wrong. Moreover, Last modified time was "earlier than" actual upload time. For example below,my files at 21:23:54, 21:24:29 have all data from 21:00:00~21:29:59
There are no issues with actual upload time and data integrity. The only problem is the "last-modified time indicated in S3".
Anybody who know this issue? Thank you.
...
2023-09-10 17:30:01 24067022 my-data+0+0012998208.json.gz
2023-09-10 17:30:01 24804019 my-data+1+0013021328.json.gz
2023-09-10 18:00:01 25148397 my-data+0+0013081184.json.gz
2023-09-10 18:00:01 25295342 my-data+1+0013105757.json.gz
...
2023-09-10 21:23:54 33226385 my-data+1+0013762488.json.gz
2023-09-10 21:24:29 32369427 my-data+0+0013733998.json.gz
...
I found one more. This problem occurs when the size of file is bigger than about 26MiB. The "last modified"is recorded 1 minute faster for every about 1MiB increase from 26MiB. My Network bandwidth and computing power are sufficient. I doubt it is due to lack of topic's partitions. But, Where is the Criteria for selecting the number of partitions in terms of file size or transfer speed?