AZSentinel icon indicating copy to clipboard operation
AZSentinel copied to clipboard

[Feature Request] Retry-able errors

Open pemontto opened this issue 3 years ago • 2 comments

Summary of the new feature/enhancement

We sometimes get pipeline errors when deploying rules with Import-AzSentinelAlertRule because a transient error has occurred. Most commonly some gateway timeout on Microsoft's side:

VERBOSE: {"error":{"code":"GatewayTimeout","message":"The gateway did not receive a response from 'Microsoft.SecurityInsights' within the specified time period."}}
Line |
  40 |  $result = Import-AzSentinelAlertRule -SubscriptionId $SubscriptionId  …
     |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Unable to invoke webrequest for rule Failed host logons but
     | success logon to AzureAD with error message: Response status
     | code does not indicate success: 504 (Gateway Timeout).

Proposed technical implementation details (optional)

Include retry logic for status codes that represent a server issue. I.e. attempt to retry 500 errors up to 3 times, but immediately continue/fail for 400 errors e.g. 400 (Bad Request)

{"error":{"code":"BadRequest","message":"Failed to run the alert rule query. One of the tables does not exist."}}

pemontto avatar May 25 '21 07:05 pemontto

hi @pemontto thanks for the feedback, sounds as a good proposal. Do you have some code example or working solution that I can integrate in AzSentinel?

pkhabazi avatar Aug 02 '21 10:08 pkhabazi

@pkhabazi I need to do some testing with the API to observe the various responses, there are likely some 4xx and 5xx errors we can safely retry. Will get back to you!

pemontto avatar Oct 01 '21 15:10 pemontto