velox icon indicating copy to clipboard operation
velox copied to clipboard

GCS Set a default policy to retry failures

Open tigrux opened this issue 11 months ago • 3 comments

GCS: Add properties to configure retry policy

GCS may encounter recoverable errors, for example, an authentification error. By default GCS keeps retrying for up to 15 minutes, but this default may be excessive, giving the impression that Velox has become unresponsive. This behaviour can be configured, with options to configure the time to keep retrying or the number of times to retry. However, the GCS connector does not allow to configure neither the retry time nor the retry count.

This change introduces two new properties:

  • hive.gcs.max-retry-count: integer The maximum retry counter of transient errors.
  • hive.gcs.max-retry-time: integer The maximum time allowed (seconds) to retry transient errors.

Fixes #9264

tigrux avatar Mar 13 '24 16:03 tigrux

Deploy Preview for meta-velox canceled.

Name Link
Latest commit 207b5f23ac4e12561ece8b528717afde71a128d4
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/663171804a6569000879ba2b

netlify[bot] avatar Mar 13 '24 16:03 netlify[bot]

Hello @kgpai and @majetideepak. We recently identified the GCS connector does not allow to configure the retry policy. This PR adds a couple of properties to configure the retry policy.

tigrux avatar Mar 26 '24 21:03 tigrux

cc: @majetideepak

kgpai avatar Mar 27 '24 18:03 kgpai

@tigrux some comments. Can we add a unit test for these configs?

I had to test manually, I do not know how to trigger a failure from the simulator.

tigrux avatar Apr 18 '24 16:04 tigrux

@majetideepak @kgpai I addressed most of the feedback, however, I was unable to add UTs (I could not get the simulator to trigger retries) so I added a note stating that I tested manually.

tigrux avatar Apr 19 '24 17:04 tigrux