kafka-connect-hdfs icon indicating copy to clipboard operation
kafka-connect-hdfs copied to clipboard

Recovery mechanism for the temp file closure failures with state stuck at SHOULD_ROTATE.

Open kaushiksrinivas opened this issue 5 years ago • 5 comments

Patch to provide retry mechanism during the temp file closure failure due to many reasons at hdfs and resulting in state stuck in SHOULD_ROTATE of the TopicPartitionWriter state machine. This patch does follow things,

  1. safeguards temp file closure and catches any exception in SHOULD_ROTATE state as well as during the last phase where in the buffer is empty and it needs to be flushed out.
  2. a private method startRecovery() has been written which clears all the existing buffers and counters. finally reset the state machine to RECOVERY_STARTED state. This ensures, the current data which is buffered and not yet written is cleared out and subsequent poll() would call write() where the state machine level is checked and finally the partition is recovered succesfully and resumes writing the records to hdfs for this partition instead of failing over temp file closure over and over again. As of now with initial commit, a counter of 3 retries has been hard coded which can be made as a configurable parameter (connector config) if this gets PR gets approval. Also the Test java code has been commented for build purposes, will need more inputs on modifying the test for the same. We can also make number of retries upon failure configurable and provide a sleep timeout before starting the recovery process if in case there are any cleanups needed at the hdfs side for hdfs connector to continue working.

kaushiksrinivas avatar Jun 26 '19 17:06 kaushiksrinivas

It looks like @kaushiksrinivas hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence. Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

ghost avatar Jun 26 '19 17:06 ghost

[clabot:check]

kaushiksrinivas avatar Jun 26 '19 18:06 kaushiksrinivas

@confluentinc It looks like @kaushiksrinivas just signed our Contributor License Agreement. :+1:

Always at your service,

clabot

ghost avatar Jun 26 '19 18:06 ghost

I think this is trying to solve the same problem as #501

gharris1727 avatar Jun 26 '20 23:06 gharris1727

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

cla-assistant[bot] avatar Aug 27 '23 12:08 cla-assistant[bot]