azure-functions-kafka-extension icon indicating copy to clipboard operation
azure-functions-kafka-extension copied to clipboard

Configurable cursor position to start processing

Open mcollier opened this issue 5 years ago • 5 comments

As a developer, I would like the ability to optional set the starting cursor position when processing the Kafka event stream, so that I can start from a specific position in the stream instead of the last saved index.

This would make it easier to replay events if needed.

This should work for all Azure Functions GA supported languages.

There is a similar issue for Event Hubs. As a developer, I would expect the Kafka and Event Hub extension it function similarly in terms of manipulating the checkpoint/cursor.

  • https://github.com/Azure/azure-functions-eventhubs-extension/issues/30
  • https://github.com/Azure/azure-webjobs-sdk/issues/1240

mcollier avatar Mar 22 '20 16:03 mcollier

would the workaround ("manually update the checkpoint before starting the function") mentioned in that EventHubs thread be something that is workable today?

ryancrawcour avatar Mar 22 '20 22:03 ryancrawcour

assuming we'd go for the same sort of semantics as talked about there - exposing a DateTime from which to process the stream.

not sure how we'd translate that in to the Kafka event though. but this would need some investigation.

tagging @fbeltrao and @amamounelsayed for visibility

ryancrawcour avatar Mar 22 '20 22:03 ryancrawcour

Looking at the documentation, the library we use exposes the options Earliest or Latest. No possibility to "start 5 minutes ago".

This option takes place in case the consumer group has no recorded commits (i.e. new consumer group). Here a summary of how some use cases could be achieved:

Use case Option
Start consumer group with commited offsets from a different offset Does not work. Create a new consumer group
Start consumer group with no commited offsets "1 hour ago" Not supported by the library. A terrible implementation could be to have the extension ignore messages until timestamp catches up.
Start consumer group with no commited offsets from the end Requires change in the extension
Start consumer group with no commited offsets from the start It is the default one
Start consumer group with no commited offsets from an specific numeric offset Might be feasible to implement. I am not sure if it is worth as the user will have to define an offset value per partition

fbeltrao avatar Mar 23 '20 11:03 fbeltrao

Not supported by the library. A terrible implementation could be to have the extension ignore messages until timestamp catches up.

agree that this sounds like a bad idea.

ryancrawcour avatar Mar 23 '20 21:03 ryancrawcour

@jeffhollan @anirudhgarg @amamounelsayed what's input on this from the Functions team in terms of what you're looking to do with the EventHubs extension?

ryancrawcour avatar Mar 23 '20 21:03 ryancrawcour