numaflow icon indicating copy to clipboard operation
numaflow copied to clipboard

feat: add a RetryInterval setting

Open QuentinFAIDIDE opened this issue 10 months ago • 1 comments

This PR proposes adding a retryInterval setting to pipeline and vertex manifests, with the default set to 1ms. Reasons for increasing the retryInterval beyond the default include reducing spammy logs on failures, saving resources on long component failures, and lack of significant performance increase of a 1ms retryInterval over a 500ms one in most cases. Reasons for sticking to the default include having highly available and scalable components, critical response times, or relying on CPU/RAM limits to prevent resource waste. Considering potential implementation of retry with backoff (mentioned in existing comment), this change may be reversed in the future. Also, always adding parameters may not be a good habit for usability, so I'm open to instead finding a compromise or renouncing it if the community deems it of little use.

QuentinFAIDIDE avatar Apr 08 '24 21:04 QuentinFAIDIDE

Sorry for requesting review, this should have been a draft PR in the first place to avoid notifying all reviewers. Also the user documentation for pipeline customization still needs to mention this new parameter.

QuentinFAIDIDE avatar Apr 08 '24 21:04 QuentinFAIDIDE