pghoard icon indicating copy to clipboard operation
pghoard copied to clipboard

Add more retries in case of download failure

Open rikonen opened this issue 2 years ago • 5 comments

In some rare cases 3 retries isn't enough but 6 is. This should really be made configurable but creating this pr as a quick fix in case nobody else feels like creating a better one.

rikonen avatar Oct 09 '22 12:10 rikonen

Those tests look somewhat unrelated since the failing tests don't even import the code that was changed. But looks like the failures are consistent and main hasn't failed with the same, though last test run for main was a long time ago so could be something has broken since.

rikonen avatar Oct 11 '22 07:10 rikonen

It looks like the reasons the tests are failing are related to https://github.com/aiven/rohmu/pull/69.

A minor detail but it would be better to write the code differently: either use an else clause in the loop instead of testing for the "range - 1" attempt, or make it obvious. Eg, either:

max_retry = 6
for i in range(max_retry):
    ....
else:
    self.log.error("Download stalled despite retries, aborting")
    self.errors = 1

Or:

max_retry = 6
for i in range(max_retry):
   if i == max_retry -1:
       ...

I'm in favor of the first one.

rdunklau avatar Oct 11 '22 08:10 rdunklau

I'll revert the change to rohmu, it was creating some issues in some dev environments as well.

alanfranz avatar Oct 11 '22 12:10 alanfranz

Added a constant for max retries to get rid of the magic numbers spread around the code.

rikonen avatar Oct 14 '22 12:10 rikonen

@rikonen I encountered a similar issue in this escalation, but I also had to increase the allowed stalled time to get it to complete.

ryansammonaiven avatar Oct 15 '22 17:10 ryansammonaiven

rohmu with reverted change needs to be released in PyPi so that tests are passing, I guess @alanfranz is working on it.

alexole avatar Oct 18 '22 08:10 alexole

Codecov Report

Merging #557 (1693152) into main (3be6acf) will increase coverage by 0.04%. The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #557      +/-   ##
==========================================
+ Coverage   91.28%   91.33%   +0.04%     
==========================================
  Files          32       32              
  Lines        4671     4672       +1     
==========================================
+ Hits         4264     4267       +3     
+ Misses        407      405       -2     
Impacted Files Coverage Δ
pghoard/restore.py 89.61% <100.00%> (-0.49%) :arrow_down:
pghoard/webserver.py 89.19% <0.00%> (+0.22%) :arrow_up:
pghoard/pghoard.py 85.03% <0.00%> (+0.65%) :arrow_up:

codecov[bot] avatar Jan 30 '23 17:01 codecov[bot]