aws-sdk-kotlin icon indicating copy to clipboard operation
aws-sdk-kotlin copied to clipboard

Waiters do not have a configurable retry / timeout strategy

Open cloudshiftchris opened this issue 1 year ago • 1 comments

Describe the bug

When using waiters the underlying operation may take considerable time that exceeds the hard-coded attempts/timeouts. This results in the waiter throwing a TooManyAttemptsException.

For example, RdsClient.waitUntilDBSnapshotAvailable has a hard-coded retry strategy:

val strategy = StandardRetryStrategy {
        maxAttempts = 20
        tokenBucket = InfiniteTokenBucket
        delayProvider {
            initialDelay = 30_000.milliseconds
            scaleFactor = 1.5
            jitter = 1.0
            maxBackoff = 120_000.milliseconds
        }
    }

In a situation where there is a large snapshot that takes greater than the ~17m the retry strategy allows for it will fail; this is unexpected as a) the operation is still in progress (in our case, it completed successfully after ~30m) and b) waitUntilDBSnapshotAvailable isn't documented as having a timeout / doesn't offer any configurability on timeouts.

Regression Issue

  • [ ] Select this option if this issue appears to be a regression.

Expected behavior

Expecting that waiters either wait indefinitely (which is the implied behaviour, though not ideal) or take a default-value parameter with a retry strategy (or any alternate approach that makes it clear there are timeouts and how to configure them).

The AWS Java SDK allow waiters to be configurable: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/waiters.html

Current behavior

As described above RdsClient.waitUntilDBSnapshotAvailable (and likely other Waiters) times out unexpectedly (well, unexpected if you haven't read the code).

Steps to Reproduce

Execute a Waiter whose operation takes more than the hard-coded retry/timeout limit.

Possible Solution

Ideally the API would be more up-front and configurable for waiters to allow configurability on the timeouts and clearly document that it isn't solely waiting for the primary condition (there's also timeouts in play).

As a workaround have cobbled together the below to allow coarse-grained configurability:

    private suspend fun waitForSnapshot(snapshotId: String) {
        var retries = 0
        while (true) {
            try {
                rdsClient.waitUntilDBSnapshotAvailable(
                    DescribeDbSnapshotsRequest { this.dbSnapshotIdentifier = snapshotId }
                )
                return
            } catch (e: TooManyAttemptsException) {
                if (retries >= 3) {
                    throw e
                }
                retries++
            }
        }
    }

Context

Using the SDK to initiate an RDS snapshot, and wait for it to complete before continuing on with upgrading the database.

AWS SDK for Kotlin version

1.3.42

Platform (JVM/JS/Native)

JVM

Operating system and version

MacOS / Linux

cloudshiftchris avatar Oct 04 '24 22:10 cloudshiftchris