converge icon indicating copy to clipboard operation
converge copied to clipboard

improve resilience against transient errors

Open ryane opened this issue 9 years ago • 0 comments

converge is currently sensitive to transient errors, particular when performing operations over the network like downloading files, keys, docker images, etc. Often, immediately running converge again after a failure will result in a successful application.

This issue is for tracking ideas on how we can improve the resilience of converge against ephemeral errors. Some ideas:

  • Build simple retry logic (perhaps with an exponential backoff) into the core engine
  • Resources can implement their own specialized retry logic if needed
  • Resources should be able to opt-out if it is undesirable or unsafe to retry
  • Some control of the retry behavior can be exposed to users via hcl parameters

Additional thoughts / ideas?

ryane avatar Nov 21 '16 17:11 ryane