lakeFS
lakeFS copied to clipboard
[lakeFSFS] Add retries and configurable timeouts to lakeFS API calls
As we did in the Spark metadata client for GC. If an API call times out and the exception leaks it can be really expensive on Spark! First the entire job is retried, this can cause partitions to be recomputed. And if it times out enough times the entire is aborted and all work is pretty much lost. Note that when lakeFS is under load it gets worse with more partitions rather than better :-/
This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.
This is blocked by #5110 (!)