apiops icon indicating copy to clipboard operation
apiops copied to clipboard

[FEATURE REQ] Set retry/timeout options in configuration

Open UniperMaster opened this issue 2 years ago • 18 comments

Release version

4.1.0 Extractor (Latest)

Describe the bug

Describe the bug When running the extractor pipeline, the extractors is erroring with the following message

he APIM has around 257 APIs and within them all:

anything from 1-50 operations on an API

Nearly all of them have policies configured at the operation level. No policy is configured at the API level

For the APIM itself:

SKU is developer

APIM is using vnet integration (internal)

Diagnostics are configured

Products are used

Subscriptions are scoped at product level

Named values are a mix of secrets, keyvault references and plain text values

The service principal has all the necessary permissions to do any API call to the APIM

No application gateway present

No self hosted gateway

A single backend configured

Api version used to create the APIM was 2021-04-01-preview pipeline.log

Expected behavior

The extractor exports all APIs

Actual behavior

Timesout on the "Exporting API", it does not progress further as the pipeline is terminated

Reproduction Steps

Run the run-extractor.yaml Extract ALL

UniperMaster avatar Mar 30 '23 14:03 UniperMaster

Error Message

  Writing API operation policy file /home/vsts/work/1/a/artifacts/apis/tableau-management-portal-v21/operations/signin/policy.xml...

crit: Extractor[0] System.AggregateException: Retry failed after 4 tries. Retry settings can be adjusted in ClientOptions.Retry or by configuring a custom retry policy in ClientOptions.RetryPolicy. (The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.) (The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.) (The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.) (The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout.) ---> System.Threading.Tasks.TaskCanceledException: The operation was cancelled because it exceeded the configured timeout of 0:01:40. Network timeout can be adjusted in ClientOptions.Retry.NetworkTimeout. ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled

UniperMaster avatar Mar 30 '23 14:03 UniperMaster

From the first site, looks like a connection issue as you seem to be running your APIM instance inside a vnet and your DevOps agent may not be able to connect to it. Did you check that?

waelkdouh avatar Mar 30 '23 19:03 waelkdouh

Nope, it is not in a v-net but doesn't the extractor use the azure rest api to extract the data therefore shouldn't need access to the v-net?

image

UniperMaster avatar Mar 31 '23 06:03 UniperMaster

@guythetechie any thoughts on this one?

waelkdouh avatar Mar 31 '23 08:03 waelkdouh

it would seem this is a SKU being overloaded. When I went to the Premium with 2 units it was fine. Occasionally I could get it to work with the Developer SKU 1 unit. If I upped the MaxRetries and Delay but I hardcoded these setting and I'm not sure how to apply it in the appsettings.json and how I would included it in the pipeline as that not cover in the documentation. Any advise would be much appriecated.

UniperMaster avatar Apr 05 '23 16:04 UniperMaster

@UniperMaster - thanks for raising this. We extract all APIs in parallel to maximize performance: https://github.com/Azure/apiops/blob/d4cc9450a77746024bde2ca12decb99122de169b/tools/code/extractor/Api.cs#L28

I can see APIM throttling requests if you're using a low SKU and have a lot of APIs. The easiest way to validate that is by changing ForEachParallel to ForEachAsync in the line above. This will extract the APIs sequentially.

Are you able in a position to make that change to the extractor code and retry?

guythetechie avatar Apr 12 '23 13:04 guythetechie

By the way, these are the default retry options. We can expose some of this in configuration so that it's overridable.

guythetechie avatar Apr 12 '23 14:04 guythetechie

@guythetechie , thanks that how I worked out it was a performance issue. I'd like to keep the repo as vanilla as possible, would we be able to have this a configurable option in the pipelines

UniperMaster avatar Apr 15 '23 08:04 UniperMaster

Adding to backlog.

guythetechie avatar May 10 '23 02:05 guythetechie

Is there any updates on this? I am also experiencing the same issue with timeout when extracting from a Developer tier APIM instance. It would also be nice to provide a retry mechanism into the extractor pipeline, rather than having to upgrade to a Basic, Standard or Premium tier

guestdj avatar Aug 31 '23 12:08 guestdj

Is there any updates on this? I am also experiencing the same issue with timeout when extracting from a Developer tier APIM instance. It would also be nice to provide a retry mechanism into the extractor pipeline, rather than having to upgrade to a Basic, Standard or Premium tier

We are currently working on implementing other features. Any chance you can submit a PR and we will be more than happy to review and merge.

waelkdouh avatar Aug 31 '23 12:08 waelkdouh

Just wanted to update that I have found a solution to the throttling issue.

It turns out that it was due to the platform version set on my APIM instance, not the Developer tier as I thought. The version was set to stv1 which uses a cloud service (classic) architecture, the newer version stv2 uses VM scale sets.

When I tested my extraction against a APIM stv2 instance with the same API's on it, the extraction worked with no issues. So now I have to figure out how to migrate the stv1 instance to stv2, but there is some documentation also on this too.

More info on platform versions can be found here

Hope this helps anyone stuck with the same issue, maybe not that many going forward as I think stv2 is now the default version for new APIM instances.

guestdj avatar Sep 08 '23 13:09 guestdj

Just wanted to update that I have found a solution to the throttling issue.

It turns out that it was due to the platform version set on my APIM instance, not the Developer tier as I thought. The version was set to stv1 which uses a cloud service (classic) architecture, the newer version stv2 uses VM scale sets.

When I tested my extraction against a APIM stv2 instance with the same API's on it, the extraction worked with no issues. So now I have to figure out how to migrate the stv1 instance to stv2, but there is some documentation also on this too.

More info on platform versions can be found here

Hope this helps anyone stuck with the same issue, maybe not that many going forward as I think stv2 is now the default version for new APIM instances.

@guestdj thank you for the update. This is great information.

@UniperMaster can you confirm if this resolves your issue as well and close the issue accordingly?

waelkdouh avatar Sep 08 '23 14:09 waelkdouh

Hey guys

I am observing same timeout issues when testing on Premium tier, stv2 APIM (with 1 to 3 units). The migration from stv1 -> stv2 did not solve the issue but slightly improved the situation(more items were processed before the timeouts hit). So setting up timeout setting with configuration would be great feature here.

Also, haven`t tested that but a config option for choosing between parallel or sequential run would also be great.

vandanchev avatar Nov 17 '23 14:11 vandanchev

Hi @vandanchev can you submit a PR so we can review it? I believe it's already an item on our backlog.

waelkdouh avatar Nov 17 '23 16:11 waelkdouh

Just wanted to update that I have found a solution to the throttling issue. It turns out that it was due to the platform version set on my APIM instance, not the Developer tier as I thought. The version was set to stv1 which uses a cloud service (classic) architecture, the newer version stv2 uses VM scale sets. When I tested my extraction against a APIM stv2 instance with the same API's on it, the extraction worked with no issues. So now I have to figure out how to migrate the stv1 instance to stv2, but there is some documentation also on this too. More info on platform versions can be found here Hope this helps anyone stuck with the same issue, maybe not that many going forward as I think stv2 is now the default version for new APIM instances.

@guestdj thank you for the update. This is great information.

@UniperMaster can you confirm if this resolves your issue as well and close the issue accordingly?

Our Dev instance is STV1, I currently I can't test this at the moment but we are planning on migrating to STV2 soon. When I initially tested it it worked for Premium.

UniperMaster avatar Nov 20 '23 09:11 UniperMaster

I have another update on the time-out issue. After doing the platform update on a APIM instance that had a LOT of api's, I started getting the timeout issue again. However I fixed it by upgrading to the Premium Tier, this has now fixed the issue, albeit a very expensive fix. I'm now waiting for the Standard v2 tier that goes GA next April!

guestdj avatar Dec 01 '23 14:12 guestdj

@guestdj this is great feedback! Please keep updating this thread which will serve as knowledge base for other in the future.

waelkdouh avatar Dec 01 '23 14:12 waelkdouh