microsoft-authentication-library-for-js icon indicating copy to clipboard operation
microsoft-authentication-library-for-js copied to clipboard

Retry for invalid_grant errors - Silent Iframe

Open jo-arroyo opened this issue 1 year ago • 4 comments

First of 3 PRs addressing need for retry for backup auth service.

jo-arroyo avatar Jul 25 '24 02:07 jo-arroyo

I wonder if we should log a flag to telemetry to explicitly indicate a retry was performed? Or is it preferable to determine this by building queries with the existing data? I expect this will have big perf implications if this is done at scale and it might be nice to have an easy way to track this. Just a suggestion but I'll leave that up to you and @konstantin-msft

tnorling avatar Jul 29 '24 22:07 tnorling

I wonder if we should log a flag to telemetry to explicitly indicate a retry was performed? Or is it preferable to determine this by building queries with the existing data? I expect this will have big perf implications if this is done at scale and it might be nice to have an easy way to track this. Just a suggestion but I'll leave that up to you and @konstantin-msft

All of these retries need an explicit indicator in telemetry that they are happening. It would likely be helpful to log the error returned for both the original call (invalid grant today, possibly others in the future?) as well as the retry result.

peterzenz avatar Jul 30 '24 15:07 peterzenz

I wonder if we should log a flag to telemetry to explicitly indicate a retry was performed? Or is it preferable to determine this by building queries with the existing data? I expect this will have big perf implications if this is done at scale and it might be nice to have an easy way to track this. Just a suggestion but I'll leave that up to you and @konstantin-msft

All of these retries need an explicit indicator in telemetry that they are happening. It would likely be helpful to log the error returned for both the original call (invalid grant today, possibly others in the future?) as well as the retry result.

It makes sense to add a new telemetry data point that indicates a retry was performed. Error codes, in turn, are captured in the "context" field for each instrumented function. However, there are two scenarios here:

  • Both the original call and retry fail -> the context is always set. The retry error is placed in the errorCode field, while the first error can be found in the "context" field only.
  • Original call fails and retry succeeds -> the context is redacted for 90% of the calls. Hence, we won't be able to capture most of the original errors so counting their numbers by type won't be possible. So, it makes sense to add yet another telemetry data point that captures the original error code.

konstantin-msft avatar Jul 30 '24 17:07 konstantin-msft

I wonder if we should log a flag to telemetry to explicitly indicate a retry was performed? Or is it preferable to determine this by building queries with the existing data? I expect this will have big perf implications if this is done at scale and it might be nice to have an easy way to track this. Just a suggestion but I'll leave that up to you and @konstantin-msft

All of these retries need an explicit indicator in telemetry that they are happening. It would likely be helpful to log the error returned for both the original call (invalid grant today, possibly others in the future?) as well as the retry result.

It makes sense to add a new telemetry data point that indicates a retry was performed. Error codes, in turn, are captured in the "context" field for each instrumented function. However, there are two scenarios here:

  • Both the original call and retry fail -> the context is always set. The retry error is placed in the errorCode field, while the first error can be found in the "context" field only.
  • Original call fails and retry succeeds -> the context is redacted for 90% of the calls. Hence, we won't be able to capture most of the original errors so counting their numbers by type won't be possible. So, it makes sense to add yet another telemetry data point that captures the original error code.

Thanks for the input, all! I added a new data point. Let me know if you prefer a different name.

jo-arroyo avatar Jul 30 '24 23:07 jo-arroyo