azure-functions-kafka-extension
azure-functions-kafka-extension copied to clipboard
Configurable cursor position to start processing
As a developer, I would like the ability to optional set the starting cursor position when processing the Kafka event stream, so that I can start from a specific position in the stream instead of the last saved index.
This would make it easier to replay events if needed.
This should work for all Azure Functions GA supported languages.
There is a similar issue for Event Hubs. As a developer, I would expect the Kafka and Event Hub extension it function similarly in terms of manipulating the checkpoint/cursor.
- https://github.com/Azure/azure-functions-eventhubs-extension/issues/30
- https://github.com/Azure/azure-webjobs-sdk/issues/1240
would the workaround ("manually update the checkpoint before starting the function") mentioned in that EventHubs thread be something that is workable today?
assuming we'd go for the same sort of semantics as talked about there - exposing a DateTime from which to process the stream.
not sure how we'd translate that in to the Kafka event though. but this would need some investigation.
tagging @fbeltrao and @amamounelsayed for visibility
Looking at the documentation, the library we use exposes the options Earliest or Latest. No possibility to "start 5 minutes ago".
This option takes place in case the consumer group has no recorded commits (i.e. new consumer group). Here a summary of how some use cases could be achieved:
| Use case | Option |
|---|---|
| Start consumer group with commited offsets from a different offset | Does not work. Create a new consumer group |
| Start consumer group with no commited offsets "1 hour ago" | Not supported by the library. A terrible implementation could be to have the extension ignore messages until timestamp catches up. |
| Start consumer group with no commited offsets from the end | Requires change in the extension |
| Start consumer group with no commited offsets from the start | It is the default one |
| Start consumer group with no commited offsets from an specific numeric offset | Might be feasible to implement. I am not sure if it is worth as the user will have to define an offset value per partition |
Not supported by the library. A terrible implementation could be to have the extension ignore messages until timestamp catches up.
agree that this sounds like a bad idea.
@jeffhollan @anirudhgarg @amamounelsayed what's input on this from the Functions team in terms of what you're looking to do with the EventHubs extension?