[Feature] Dynamic reading and writing to avoid failures due to network/IO system overload
Search before asking
- [X] I searched the issues and found no similar issues.
Component
Library/core
Feature
Many-a-times, long running jobs get killed due to read/write failures owing to I/O overload. Read-writes are also constrained by network access such as network bandwidth etc.
In order to minimize long running jobs getting killed due to read/write failures owing to I/O overload, requesting a feature to incorporate dynamic reading and writing including:
-
Random backoff mechanism to relieve I/O pressure (I/O spread factor).
-
Readjusting read/write rates based on COS/file system response.
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Random backoff mechanism to relieve I/O pressure (I/O spread factor).
We already have 2 level retries
Readjusting read/write rates based on COS/file system response.
This is by far more complex. Not sure how realistic it is