azure-docs icon indicating copy to clipboard operation
azure-docs copied to clipboard

How does clientRetryOptions work in relation to Retry Policies?

Open MartinWickman opened this issue 3 years ago • 21 comments

I'm trying to understand how and when clientRetryOptions and maxAutoLockRenewalDuration are used. It's not clear from the docs.

What's confusing is when you use retry policies attributes which you put on functions. It seems to me they conflict with each other, or I am just missing something crucial here?

It boils down to this:

  1. Are the retry policy attributes (such as [FixedDelayRetry]) related to the clientRetryOptions setting in host.json? Are they the same? Will one override the other or will they multiply on top of each other?
  2. How and when do the maxAutoLockRenewalDuration setting come in to play? Default is 5 minutes, but default lock duration on Service Bus is like 1 minute. Doesn't that mean the leas will expire in one minute and then after 5 minutes it will be renewed? What about when a retry policy is doing retries (possibly for hours). I don't get it.
    [FixedDelayRetry(maxRetryCount: 10, delayInterval: "00:00:30")]
    public void MyFunction([ServiceBusTrigger("%QueueName%", Connection = "Connection")])
{
    "version": "2.0",
    "extensions": {
        "serviceBus": {
            "clientRetryOptions":{
                "mode": "exponential",
                "tryTimeout": "00:01:00",
                "delay": "00:00:00.80",
                "maxDelay": "00:01:00",
                "maxRetries": 3
            },
            "maxAutoLockRenewalDuration": "00:05:00",
        }
    }


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

MartinWickman avatar Mar 22 '22 15:03 MartinWickman

Thank you for your feedback! We will review and update as appropriate.

mike-urnun-msft avatar Mar 22 '22 18:03 mike-urnun-msft

Hello @MartinWickman - I answered your questions below:

Are the retry policy attributes (such as [FixedDelayRetry]) related to the clientRetryOptions setting in host.json? Are they the same? Will one override the other or will they multiply on top of each other?

Both retry policies are separate and layer on top of each other. As result, the total number of retries multiply. You may review this explanation where it discusses the effect of runtime retry and service bus retry policies.

How and when do the maxAutoLockRenewalDuration setting come in to play? Default is 5 minutes, but default lock duration on Service Bus is like 1 minute. Doesn't that mean the leas will expire in one minute and then after 5 minutes it will be renewed? What about when a retry policy is doing retries (possibly for hours). I don't get it.

maxAutoLockRenewalDuration is set by the Service Bus consumer/client application which in this case is Azure Functions App, whereas the Lock Duration is a setting on the Service Bus broker platform. In other words, Lock Duration is what you specify globally in your SB namespace on how long a message should be in locked mode (safely preventing other consumers from processing the same message and going into race condition) while it's being processed by a consumer application, and if there's a chance that it'll need more time, the consumer application can then set the maxAutoLockRenewalDuration setting to renew the lock duration.

Since we didn't determine any changes to this doc upon reviewing your feedback, we will now proceed to close this thread. If there are further questions regarding this matter, please reopen it and we will gladly continue the discussion.

mike-urnun-msft avatar Mar 29 '22 04:03 mike-urnun-msft

Thanks @mike-urnun-msft for you response. I have one follow up question:

Are the retry policy attributes (such as [FixedDelayRetry]) related to the clientRetryOptions setting in host.json? Are they the same? Will one override the other or will they multiply on top of each other?

Both retry policies are separate and layer on top of each other. As result, the total number of retries multiply. You may review this explanation

I do think we're talking about different things here. I am referring to the serviceBus/clientRetryOptions setting in host.json (see below). The resilient retries you are talking about is the one defined on the service bus itself, and not are not defined here for sure.

How is serviceBus/clientRetryOptions setting related to the retry policy attributes (such as [FixedDelayRetry]). Clearly there is is something I'm missing here (or I'm reading the docs wrong).

{
    "version": "2.0",
    "extensions": {
        "serviceBus": {
            "clientRetryOptions":{
                "mode": "exponential",
                "tryTimeout": "00:01:00",
                "delay": "00:00:00.80",
                "maxDelay": "00:01:00",
                "maxRetries": 3
            },
            "maxAutoLockRenewalDuration": "00:05:00",
        }
    }

MartinWickman avatar Mar 29 '22 12:03 MartinWickman

@mike-urnun-msft did you see my question above? I don't feel this issue is quite resolved yet.

MartinWickman avatar Apr 29 '22 09:04 MartinWickman

Can I bump this please, I'm currently having to debug an issue with an Azure Function that's triggered from Azure Service Bus, however via the Custom Handler Approach (so we don't have the Azure Function Attributes on our Functions' "Run" Method).

What we're seeing is the retry options not being obeyed and the docs are very unclear as to what maps to what.

AJMcKane avatar Jun 16 '22 16:06 AJMcKane

Retry policies going forward will only be supported for Timer and Event Hubs triggers. We've updated the docs for the retry policy GA here: https://docs.microsoft.com/azure/azure-functions/functions-bindings-error-pages#retries

ggailey777 avatar Jun 16 '22 17:06 ggailey777

I should also point out those client retry options were introduced in v5.x of the extension. Are you using the latest version of the Service Bus extension?

ggailey777 avatar Jun 16 '22 17:06 ggailey777

Ahh, it looks like we're on 2.0

"extensionBundle": {
    "id": "Microsoft.Azure.Functions.ExtensionBundle",
    "version": "[1.*, 2.0.0)"
  },
  "functionTimeout": "01:00:00",
  "customHandler": {
    "description": {
      "defaultExecutablePath": "FunctionHandler",
      "workingDirectory": "",
      "arguments": []
    },
    "enableForwardingHttpRequest": true
  },
  "extensions": {
    "serviceBus": {
      "clientRetryOptions": {
        "mode": "exponential",
        "tryTimeout": "00:05:00",
        "delay": "00:01:00",
        "maxDelay": "00:10:00",
        "maxRetries": 5
      },
      "messageHandlerOptions": {
        "maxConcurrentCalls": 3
      }
    }
  }

I'll action that with my team and see if it helps! Thanks :)

AJMcKane avatar Jun 17 '22 14:06 AJMcKane

Retry policies going forward will only be supported for Timer and Event Hubs triggers. We've updated the docs for the retry policy GA here: https://docs.microsoft.com/azure/azure-functions/functions-bindings-error-pages#retries

That's quite the surprise! I'm sure lots of people are using things like [ExponentialBackoffRetry] to handle retries especially for Service Bus. Just to make it clear: Service Buss native retry support is not even close to being the same thing, and to be frank: having SB retrying the same message 10 times as fast as possible and then dead-letter it is not really helping anyone mitigate any temporary errors. What is missing is "retry with delay" and that's what the policies are (were) used for.

So what would be a reasonable migration strategy for for those people? I'm sure you have thought about that and simply just forgot to update the documentation.

MartinWickman avatar Jun 17 '22 15:06 MartinWickman

What I found strange coming from AWS SQS to Azure and Service bus is that retries don't re-enter the queue. My expectation would be to put the message back on the queue (at the bottom) with a minimum retry delay.

The reason we've stumbled into this area is that with the current retry behaviour, if you have a large block of messages that'll fail (say due to a transient corrupted piece of data, or temp api outage), your entire ingestion will block up as your X Functions constantly keep retrying messages instead of cycling through them in order.

AJMcKane avatar Jun 17 '22 15:06 AJMcKane

I am also very interested in seeing what are the alternative options for this as there is so far no mechanism in Service bus as far as I am aware that allows for the delayed retry. I had in fact had to use Durable Functions to achieve this goal since even ExponentialBackoffRetry is not bullet proof, but it worked to some extent.

ilya-git avatar Jun 20 '22 08:06 ilya-git

I've updated the extensions version to [3.0.0, 4.00) which includes v5 of the extension bundle and the retries are still attempting to trigger immediately. this is a custom handler with a serviceBus "in" binding.

From what the docs say the clientRetryOptions should work in this use case?

AJMcKane avatar Jun 21 '22 15:06 AJMcKane

I also want to bump this since we're also seeing the ClientRetryOptions not being obeyed and the docs are very unclear.

tufberg avatar Sep 30 '22 07:09 tufberg

Same here...have been struggling with this. ClientRetryOptions not being obeyed.

cmclellen avatar Oct 03 '22 08:10 cmclellen

+1

pferrot avatar Oct 25 '22 11:10 pferrot

I am also very interested in how to get retry delay to work, I have been testing with the host.json file without expected result.

AnjaEndresen01 avatar Nov 07 '22 07:11 AnjaEndresen01

We've had another occurrence of our retry options not working. Does anyone know of the recommended work-around here?

AJMcKane avatar Nov 08 '22 15:11 AJMcKane

Also struggling with this issue. Even after setting ClientRetryOptions in Startup.cs.

Sakkie avatar Nov 18 '22 06:11 Sakkie

Retry policies going forward will only be supported for Timer and Event Hubs triggers. We've updated the docs for the retry policy GA here: https://docs.microsoft.com/azure/azure-functions/functions-bindings-error-pages#retries

That's quite the surprise! I'm sure lots of people are using things like [ExponentialBackoffRetry] to handle retries especially for Service Bus. Just to make it clear: Service Buss native retry support is not even close to being the same thing, and to be frank: having SB retrying the same message 10 times as fast as possible and then dead-letter it is not really helping anyone mitigate any temporary errors. What is missing is "retry with delay" and that's what the policies are (were) used for.

So what would be a reasonable migration strategy for for those people? I'm sure you have thought about that and simply just forgot to update the documentation.

It would seem like many people are struggling with finding definitive guidance for this (myself included). Seeing as how functionality was removed from preview to GA to support a pattern of delaying Service Bus retries you would think the team would at least offer a recommendation to achieve the same results (delayed retries).

The question was also asked in https://github.com/Azure/azure-functions-dotnet-worker/issues/955 and went unanswered and that conversation is locked. @mike-urnun-msft I think it would help a lot of folks in this issue and in the one referenced to at least have a recommendation or official input from the team on how to best accomplish this.

andrewdmoreno avatar Dec 01 '22 19:12 andrewdmoreno

Ok will Assured u nxt week buddy how thinks works

On Thursday, December 1, 2022, Andrew Moreno @.***> wrote:

Retry policies going forward will only be supported for Timer and Event Hubs triggers. We've updated the docs for the retry policy GA here: https://docs.microsoft.com/azure/azure-functions/functions-bindings-error-pages#retries

That's quite the surprise! I'm sure lots of people are using things like [ExponentialBackoffRetry] to handle retries especially for Service Bus. Just to make it clear: Service Buss native retry support is not even close to being the same thing, and to be frank: having SB retrying the same message 10 times as fast as possible and then dead-letter it is not really helping anyone mitigate any temporary errors. What is missing is "retry with delay" and that's what the policies are (were) used for.

So what would be a reasonable migration strategy for for those people? I'm sure you have thought about that and simply just forgot to update the documentation.

It would seem like many people are struggling with finding definitive guidance for this (myself included). Seeing as how functionality was removed from preview to GA to support a pattern of delaying Service Bus retries you would think the team would at least offer a recommendation to achieve the same results (delayed retries).

The question was also asked in Azure/azure-functions-dotnet-worker#955 and went unanswered and that conversation is locked. @mike-urnun-msft I think it would help a lot of folks in this issue and in the one referenced to at least have a recommendation or official input from the team on how to best accomplish this.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.< https://ci3.googleusercontent.com/proxy/uLL52Wzv-qUR2sxY0cf2jg_N8wdmlwOPkMfkjsXvXxSrw1yTZRUWGMwTBCEznWBds2GRSbdgJ1Gw4X4aUAUm8YkwKJYJ-HwH99M505c4uB-ggHHrm-k3jtnxJkTzkiIRtYDvo-yxkhGy5cwB5GPUnZROZNoQMAJY6J5QQbunV2s3pCRURjtnU0Q6iSRQqim3NxZMaUf0wY44S5Yt9PjWNuadT-9UhAuWtgj-4uP-jA=s0-d-e1-ft#https://github.com/notifications/beacon/A4P7XZRUB6MXTLMDCNN2UPDWLDZY3A5CNFSM5RLGKDE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOJ6DKSUQ.gif>Message ID: @.***>

Extravisio avatar Dec 01 '22 20:12 Extravisio

@andrewdmoreno @Extravisio @Sakkie and the rest - My apologies for the long silence. I'll revisit the use cases here, raise this issue internally for further clarity, and share my findings with you all here.

mike-urnun-msft avatar Dec 01 '22 21:12 mike-urnun-msft

+1 struggling with this issue, hence looking forward hearing the alternative options for the delayed retry policy

sanjastojkova avatar Dec 06 '22 23:12 sanjastojkova

+1 According to documentation clientRetryOptions works for transient errors. How do we evaluate/test these scenarios? We are having hard time implementing re try for service bus triggered functions.

NagaMellempudi avatar Dec 07 '22 18:12 NagaMellempudi

+1

nour95 avatar Jan 12 '23 14:01 nour95

I think the OP has raised some valid questions and concerns here.

While I am a big fan of Azure functions for the simplicity it provides, the documentation regarding retries needs better explanation and scenario specific elaboration.

I am reading the article titled Azure Service Bus bindings for Azure Functions

        "serviceBus": {
            "clientRetryOptions":{
                "mode": "exponential",
                "tryTimeout": "00:01:00",
                "delay": "00:00:00.80",
                "maxDelay": "00:01:00",
                "maxRetries": 3
            }

image

What am I to understand with the caveated guidance They don't affect retries of function executions ?- Does exponential backoff work for ServiceBus or not ? I need to handle transient errors.

Thanks.

sdg002 avatar Jan 22 '23 22:01 sdg002

Hello Team, Can somebody from Microsoft please confirm if exponential back-off setting under the clientRetryOptions element of host.json works for Python Azure functions ?

I am using Azure Functions Tools 4..0.4 and Python version is 3.9.7

Thanks, Sau

sdg002 avatar Jan 23 '23 12:01 sdg002

Ahh, it looks like we're on 2.0

"extensionBundle": {
    "id": "Microsoft.Azure.Functions.ExtensionBundle",
    "version": "[1.*, 2.0.0)"
  },
  "functionTimeout": "01:00:00",
  "customHandler": {
    "description": {
      "defaultExecutablePath": "FunctionHandler",
      "workingDirectory": "",
      "arguments": []
    },
    "enableForwardingHttpRequest": true
  },
  "extensions": {
    "serviceBus": {
      "clientRetryOptions": {
        "mode": "exponential",
        "tryTimeout": "00:05:00",
        "delay": "00:01:00",
        "maxDelay": "00:10:00",
        "maxRetries": 5
      },
      "messageHandlerOptions": {
        "maxConcurrentCalls": 3
      }
    }
  }

I'll action that with my team and see if it helps! Thanks :)

Hello @AJMcKane , @ggailey777 Please, could one of you guide me as to how to go about installing and referencing the version 5 of the Azure extensions ?

Thanks.

sdg002 avatar Jan 23 '23 13:01 sdg002

@sdg002 updating the version of your Azure.Functions.ExtensionBundle to the latest does this.

@mike-urnun-msft do we have any update or ETA on a solution / alternate option for this?

AJMcKane avatar Feb 02 '23 12:02 AJMcKane

@mike-urnun-msft, any updates on it?

vadymal avatar Mar 08 '23 10:03 vadymal

As far as I know, there is still no solution for a real exponential backoff. I ended up building a nuget package myself, hacked some reflection in there to create a MessageAction+ binding for the ServiceBus function trigger. This can be used to backoff a message and works by completing the current message and creating a new postponed message. Although that would only work for a queue, not a topic (as you cannot create a message on a topic for just 1 subscription).

tomkuijsten avatar Mar 13 '23 21:03 tomkuijsten