components-contrib icon indicating copy to clipboard operation
components-contrib copied to clipboard

Dapr kafka endpoint for azure eventhub receiving duplicated messages

Open pyjads opened this issue 2 years ago • 6 comments

I am using Dapr kafka endpoint for azure eventhub. I have created 3 partitions in the eventhub. I have 3 pods in kubernetes running to read events from eventhub. When I publish 3 events, each event is received by all the 3 consumers.

Every message is processed by all the three consumers. It is resulting in wastage of resources and duplication of outputs. Sometimes, the event is consumed twice or even thrice by the same consumer.

Each events take approx. 4 minutes to process.

All these three consumers belong to the same consumer group.

Below is my yaml file:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pubsub
spec:
  type: pubsub.kafka
  version: v1
  metadata:
    - name: brokers
      value: "my_evenhtub_namespace_name.servicebus.windows.net:9093"
    - name: consumerGroup
      value: "consumer1"
    - name: authRequired
      value: "true"
    - name: authType
      value: "password"
    - name: saslUsername
      value: $ConnectionString
    - name: saslPassword
      secretKeyRef:
        name: eventhub-connection-key
        key: eventhub-connection-key
    - name: version # Optional.
      value: 1.0.0
    - name: initialOffset
      value: "oldest"
    - name: consumeRetryInterval
      value: 6s
auth:
  secretStore: secretref
apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
  name: consumer-resiliency
spec:
  policies:
    retries:
      pubsubRetry:
        policy: constant
        duration: 10s
        maxRetries: 1
  targets:
    components:
      pubsub:
        inbound:
          retry: pubsubRetry
apiVersion: dapr.io/v1alpha1
kind: Subscription
metadata:
  name: consumer
spec:
  topic: dev
  route: /process
  pubsubname: pubsub
  deadLetterTopic: poison-message
scopes:
- consumer-dev
---
apiVersion: dapr.io/v1alpha1
kind: Subscription
metadata:
  name: recorder-consumer-deadletter
spec:
  topic: dead-message
  route: /dead
  pubsubname: pubsub
scopes:
- consumer-dev

pyjads avatar Jun 28 '23 11:06 pyjads

This is how Event Hubs works, with or without Kafka. It delivers messages to all subscribers.

If you want only one subscriber to receive a message, I would recommend looking into different brokers, such as Azure Service Bus in Queues mode (in "Topics" mode, all subscribers receive a message)

ItalyPaleAle avatar Jun 28 '23 17:06 ItalyPaleAle

@ItalyPaleAle All these three subscribers belong to the same consumer group. If using official library (EventHubConsumerClient), it load balances and ensures only one subscriber register to a partition.

If there are multiple subscribers, only one subscriber can take ownership of the partition. Only, in rare cases there will be re processing of events.

Sources: https://stackoverflow.com/questions/61260996/azure-event-hub-there-can-be-at-most-5-concurrent-readers-on-a-partition-per-co

https://learn.microsoft.com/en-us/azure/event-hubs/event-processor-balance-partition-load

pyjads avatar Jun 28 '23 19:06 pyjads

Sometimes, the same subscriber consumes the same message more than once in a row.

pyjads avatar Jun 28 '23 19:06 pyjads

@skyao @DeepanshuA @mukundansundar you are a lot more knowledgeable about Kafka and Event Hubs since you've been working on it recently. Could you please take a look?

@pyjads can you confirm which version of Dapr you're using?

ItalyPaleAle avatar Jun 29 '23 14:06 ItalyPaleAle

I am using version 1.10.7.

Duplicate processing occurs very frequently. Is it happening because of the time it takes to process each event which is approximately 4 minutes.

I have to maintain the states by computing hash of the message and update in database and check if the event was already processed.

pyjads avatar Jun 29 '23 20:06 pyjads

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 29 '23 21:07 github-actions[bot]