components-contrib
components-contrib copied to clipboard
Dapr kafka endpoint for azure eventhub receiving duplicated messages
I am using Dapr kafka endpoint for azure eventhub. I have created 3 partitions in the eventhub. I have 3 pods in kubernetes running to read events from eventhub. When I publish 3 events, each event is received by all the 3 consumers.
Every message is processed by all the three consumers. It is resulting in wastage of resources and duplication of outputs. Sometimes, the event is consumed twice or even thrice by the same consumer.
Each events take approx. 4 minutes to process.
All these three consumers belong to the same consumer group.
Below is my yaml file:
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: pubsub
spec:
type: pubsub.kafka
version: v1
metadata:
- name: brokers
value: "my_evenhtub_namespace_name.servicebus.windows.net:9093"
- name: consumerGroup
value: "consumer1"
- name: authRequired
value: "true"
- name: authType
value: "password"
- name: saslUsername
value: $ConnectionString
- name: saslPassword
secretKeyRef:
name: eventhub-connection-key
key: eventhub-connection-key
- name: version # Optional.
value: 1.0.0
- name: initialOffset
value: "oldest"
- name: consumeRetryInterval
value: 6s
auth:
secretStore: secretref
apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
name: consumer-resiliency
spec:
policies:
retries:
pubsubRetry:
policy: constant
duration: 10s
maxRetries: 1
targets:
components:
pubsub:
inbound:
retry: pubsubRetry
apiVersion: dapr.io/v1alpha1
kind: Subscription
metadata:
name: consumer
spec:
topic: dev
route: /process
pubsubname: pubsub
deadLetterTopic: poison-message
scopes:
- consumer-dev
---
apiVersion: dapr.io/v1alpha1
kind: Subscription
metadata:
name: recorder-consumer-deadletter
spec:
topic: dead-message
route: /dead
pubsubname: pubsub
scopes:
- consumer-dev
This is how Event Hubs works, with or without Kafka. It delivers messages to all subscribers.
If you want only one subscriber to receive a message, I would recommend looking into different brokers, such as Azure Service Bus in Queues mode (in "Topics" mode, all subscribers receive a message)
@ItalyPaleAle All these three subscribers belong to the same consumer group. If using official library (EventHubConsumerClient), it load balances and ensures only one subscriber register to a partition.
If there are multiple subscribers, only one subscriber can take ownership of the partition. Only, in rare cases there will be re processing of events.
Sources: https://stackoverflow.com/questions/61260996/azure-event-hub-there-can-be-at-most-5-concurrent-readers-on-a-partition-per-co
https://learn.microsoft.com/en-us/azure/event-hubs/event-processor-balance-partition-load
Sometimes, the same subscriber consumes the same message more than once in a row.
@skyao @DeepanshuA @mukundansundar you are a lot more knowledgeable about Kafka and Event Hubs since you've been working on it recently. Could you please take a look?
@pyjads can you confirm which version of Dapr you're using?
I am using version 1.10.7.
Duplicate processing occurs very frequently. Is it happening because of the time it takes to process each event which is approximately 4 minutes.
I have to maintain the states by computing hash of the message and update in database and check if the event was already processed.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.