numaflow
numaflow copied to clipboard
feat: add a RetryInterval setting
This PR proposes adding a retryInterval setting to pipeline and vertex manifests, with the default set to 1ms. Reasons for increasing the retryInterval beyond the default include reducing spammy logs on failures, saving resources on long component failures, and lack of significant performance increase of a 1ms retryInterval over a 500ms one in most cases. Reasons for sticking to the default include having highly available and scalable components, critical response times, or relying on CPU/RAM limits to prevent resource waste. Considering potential implementation of retry with backoff (mentioned in existing comment), this change may be reversed in the future. Also, always adding parameters may not be a good habit for usability, so I'm open to instead finding a compromise or renouncing it if the community deems it of little use.
Sorry for requesting review, this should have been a draft PR in the first place to avoid notifying all reviewers. Also the user documentation for pipeline customization still needs to mention this new parameter.