azure-sdk-for-java icon indicating copy to clipboard operation
azure-sdk-for-java copied to clipboard

[BUG] EventHub's retry method always adds 4 seconds to a configured delay

Open adriannowak opened this issue 1 year ago • 6 comments

Describe the bug EventHub SDK fails to deliver retry messages at configured rate. Seems it just ignores the delay which is set in AmqpRetryOptions.getDelay() and by some reason adds extra time SERVER_BUSY_WAIT_TIME. This property is not configurable, and is always 4 seconds. In a consequence SDK delivers messages slower than it is expected.

Exception or Stack Trace Add the exception log and stack trace if available N/A

To Reproduce Steps to reproduce the behavior:

The following unit test should pass after at most 5 seconds (1 failure + 2 retries + timeout 3x500). With the latest version of SDK following code takes 16 seconds.

    @Test
    @Timeout(value = 5)
    void withRetryMono() {
        // Arrange
        final String timeoutMessage = "Operation timed out.";
        final AmqpRetryOptions options = new AmqpRetryOptions()
            .setMaxRetries(2)
            .setTryTimeout(Duration.ofMillis(500))
            .setMode(AmqpRetryMode.FIXED);

        final AtomicInteger resubscribe = new AtomicInteger();
        final Mono<AmqpTransportType> neverFlux = TestPublisher.<AmqpTransportType>create().mono()
            .doOnSubscribe(s -> resubscribe.incrementAndGet());

        StepVerifier.create(RetryUtil.withRetry(neverFlux, options, timeoutMessage))
            .expectSubscription()
            .expectErrorSatisfies(error -> assertTrue(error.getCause() instanceof TimeoutException))
            .verify();

        assertEquals(options.getMaxRetries() + 1, resubscribe.get());
    }

Code Snippet Add the code snippet that causes the issue.

This breaking change was introduced in this PR

https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/RetryUtil.java#L109

        final Duration delay = options.getDelay().plus(SERVER_BUSY_WAIT_TIME);

Expected behavior A clear and concise description of what you expected to happen.

SDK retries the delivery at a configured pace.

        final Duration delay = options.getDelay();

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • [x] Bug Description Added
  • [x] Repro Steps Added
  • [x] Setup information Added

adriannowak avatar Feb 02 '24 09:02 adriannowak

Thanks for filing this issue, @adriannowak. @conniey could you take a look at this regression?

/cc @Azure/azsdk-sb-java

joshfree avatar Feb 02 '24 17:02 joshfree

Hey @joshfree @conniey - Could you please give any update on the issue? When will be the fix available?

adriannowak avatar Mar 04 '24 10:03 adriannowak

Thanks for reporting this. The SERVER_BUSY_WAIT_TIME should only be applied in cases where the exception is a "ServerBusyException" to align with behaviour in the legacy library.

That PR was part of a retry policy cleanup in 2021.

conniey avatar Mar 14 '24 17:03 conniey

Note: given the environment is experiencing server-busy throttling from the EH service, the overall slowness may not change much since the SDK (after any fix) is still has to use back-off on server-busy.

The SDK requirement to apply back-off on server-busy is not a regression, this is a recommended pattern, which exists in all generations of Event Hubs library (including track1 3.4.x).

anuchandy avatar Mar 14 '24 17:03 anuchandy

@adriannowak Does this issue still persist?

samvaity avatar Nov 13 '25 23:11 samvaity

Hi @adriannowak. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] avatar Dec 11 '25 19:12 github-actions[bot]