datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Kafka producer retry

Open djordje-mijatovic opened this issue 2 years ago • 2 comments

Checklist

  • [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [ ] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Documentation: https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html

Added properties:

retries (Default = 2147483647) Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Produce requests will be failed before the number of retries has been exhausted if the timeout configured by delivery.timeout.ms expires first before successful acknowledgement. Users should generally prefer to leave this config unset and instead use delivery.timeout.ms to control retry behavior.

retry.backoff.ms (Default = 100) The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios.

request.timeout.ms (Default = 30000) The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. This should be larger than replica.lag.time.max.ms (a broker configuration) to reduce the possibility of message duplication due to unnecessary producer retries.

delivery.timeout.ms (Default = 120000) An upper bound on the time to report success or failure after a call to send() returns. This limits the total time that a record will be delayed prior to sending, the time to await acknowledgement from the broker (if expected), and the time allowed for retriable send failures. The producer may report failure to send a record earlier than this config if either an unrecoverable error is encountered, the retries have been exhausted, or the record is added to a batch which reached an earlier delivery expiration deadline. The value of this config should be greater than or equal to the sum of request.timeout.ms and linger.ms.

resolve #6377

djordje-mijatovic avatar Nov 08 '22 09:11 djordje-mijatovic

Unit Test Results (build & test)

621 tests   617 :heavy_check_mark:  15m 41s :stopwatch: 157 suites      4 :zzz: 157 files        0 :x:

Results for commit e328b168.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Nov 08 '22 09:11 github-actions[bot]

Overall this looks very nice, thanks!

david-leifker avatar Nov 30 '22 20:11 david-leifker

Hey @djordje-mijatovic have you had a chance to review David's feedback?

aditya-radhakrishnan avatar Dec 19 '22 18:12 aditya-radhakrishnan

Hey @djordje-mijatovic have you had a chance to review David's feedback?

Yes. Everything is fixed.

djordje-mijatovic avatar Dec 20 '22 10:12 djordje-mijatovic