kafka-connect-reddit
kafka-connect-reddit copied to clipboard
Offsets can be tracked via thing ID
The IDs reddit associates with its threads and comments are monotonically increasing base 36 numbers; these should be used to store offset information instead of timestamp since two posts/comments can potentially be made at the same time. This should be implemented carefully to preserve backwards compatibility. Offsets should still be written with submission timestamps and the connector should still support filtering messages based on timestamp in to allow for a rolling upgrade of the connector. However, it should give priority to IDs in offsets if it finds them.