data-prep-kit icon indicating copy to clipboard operation
data-prep-kit copied to clipboard

[Feature] Dynamic reading and writing to avoid failures due to network/IO system overload

Open dhirajjoshi16 opened this issue 1 year ago • 1 comments

Search before asking

  • [X] I searched the issues and found no similar issues.

Component

Library/core

Feature

Many-a-times, long running jobs get killed due to read/write failures owing to I/O overload. Read-writes are also constrained by network access such as network bandwidth etc.

In order to minimize long running jobs getting killed due to read/write failures owing to I/O overload, requesting a feature to incorporate dynamic reading and writing including:

  • Random backoff mechanism to relieve I/O pressure (I/O spread factor).

  • Readjusting read/write rates based on COS/file system response.

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

dhirajjoshi16 avatar Jun 19 '24 14:06 dhirajjoshi16

Random backoff mechanism to relieve I/O pressure (I/O spread factor). We already have 2 level retries Readjusting read/write rates based on COS/file system response. This is by far more complex. Not sure how realistic it is

blublinsky avatar Jun 20 '24 20:06 blublinsky