keda
keda copied to clipboard
Prefix for 'goSDK' checkpoint strategy for Azure Event Hub scalter
Proposal
I suggest to add the possibility to define a prefix for the 'goSDK' checkpoint strategy in the trigger specification of the Azure Event Hub scaler.
e.g.
triggers:
- type: azure-eventhub
metadata:
connectionFromEnv: EVENTHUB_CONNECTIONSTRING_ENV_NAME
storageConnectionFromEnv: STORAGE_CONNECTIONSTRING_ENV_NAME
consumerGroup: $Default
unprocessedEventThreshold: '64'
blobContainer: 'name_of_container'
goSdkPrefix: 'someDir/dapr-dapreventhubname-daprconsumergroup-'
checkpointStrategy: goSdk
The search path for the checkpoint file would look like:
https://myaccount.blob.core.windows.net/name_of_container/someDir/dapr-dapreventhubname-daprconsumergroup-0
Use-Case
Allows usage of Keda together with DAPR and Azure Event Hub.
Dapr creates files on the blob storage which are named
{blobContainer}/dapr-{topic}-{consumerGroup}-{partitionId}
.
Keda on the other hand looks for
{blobContainer}/{partiotionId}
.
Anything else?
Even though I don't know go yet I would try myself on a PR.
There are two options here, we can do either or both:
- Support a prefix/format, potentially requiring value injection as shown above
- Support
dapr
checkpointing, but then this same problem might come up later on with another product
Thoughts @kedacore/keda-core-contributors?
I would go with the value injection approach as it leaves room for customization.
That is a good option, but doing both might be good as well given every Dapr end-user would otherwise have to do this customization which is less trivial from a UX standpoint.
A suggestion:
triggers:
- type: azure-eventhub
metadata:
connectionFromEnv: EVENTHUB_CONNECTIONSTRING_ENV_NAME
storageConnectionFromEnv: STORAGE_CONNECTIONSTRING_ENV_NAME
consumerGroup: $Default
unprocessedEventThreshold: '64'
blobContainer: 'name_of_container'
goSdkTemplate: 'dapr'
checkpointStrategy: goSdk
-> {blobContainer}/dapr-{topic}-{consumerGroup}-{partitionId}
triggers:
- type: azure-eventhub
metadata:
connectionFromEnv: EVENTHUB_CONNECTIONSTRING_ENV_NAME
storageConnectionFromEnv: STORAGE_CONNECTIONSTRING_ENV_NAME
consumerGroup: $Default
unprocessedEventThreshold: '64'
blobContainer: 'name_of_container'
goSdkTemplate: 'prefix'
goSdkPrefix: 'someDir/somePrefix-'
checkpointStrategy: goSdk
-> {blobContainer}/someDir/somePrefix-{partitionId}
If we go that route, I wouldn't use goSdk*
but something more generic in case we want to do this for other checkpoint strategies as well.
How about checkpointStrategyTemplate
and chekpointPrefix
?
Fine by me. Thoughts @zroubalik & @JorTurFer? I'd add a Dapr
strategy as well though.
I agree with @tomkerkhove adding a Dapr
strategy because it has value itself and for future product integrations, we can support also the checkpointStrategyTemplate/chekpointPrefix
Are you willing to still contribute this @brainslush?
I'll try. Haven't done coding in Go before but when I read the code I got the impression that this shouldn't be difficult.
Hi,
As i checked the dapr behavior with container apps and the eventhub scaler, a view month ago, it worked well with the goSdk. But the eventhub component was rewritten:
https://github.com/dapr/components-contrib/pull/1292/commits/29e22a0d4308a1de861271a0fba1ec48fe737555#diff-3a38d7491c58638ad54630db869afd8fbd80db683f62dc62f18a9c5b214b0962
which is good, we are waiting for some changes too :-)
So, I'm with @tomkerkhove. from the end-user perspective dapr is more concrete and readable. For example, if you create a container app you don't want to configure things like a template path. This is an implementation detail.
I would focus on some concrete checkpoint strategy implementations and not a generic solution that lifts the complexity into the scaler configuration. Most of the eventhub processor implementations like in c#, Azure Functions, java, javascript, or python switched over to the metadata checkpoint strategy. So the gosdk variant is currently a very specific one.
just my opinion. I think If there is a need for a very generic solution, it could be added later. So i guess the dapr one is more important.
@brainslush, i started last year with the checkpoint strategy variant. If you need help or want that i jump in, feel free to ask!
I'm currently a bit busy but I still intend to do it. @christle But, in case I get stuck I'll get back to you. I agree that implementing a specific strategy is the best solution. I had a chat with the dapr developers and they assured me that they intend to keep the path and behavior as it is right now.
Created https://github.com/kedacore/keda/issues/3141 for Dapr checkpointing
FYI: I've created a PR on Dapr to bring that topic forward: https://github.com/dapr/components-contrib/pull/1940
Does that mean we will only support the new Dapr version or not? (just to verify)
For older versions, the goSdk Checkpoint Strategy works like before. But since January 2022, this strategy doesn't work anymore for pubsub because of the naming chance. But still for bindings.
With the small dapr change, for new dapr versions, a new checkpoint strategy "dapr" is needed. For older versions, the goSdk checkpoint strategy works well.
Oh boy, this will be fun for support cases - Thanks for following up and let's make sure we have good docs on this :)
I think this is already done. We can reopen it if not
Related: https://github.com/kedacore/keda/issues/3141