resizer
resizer copied to clipboard
SqlReader Retry Logic
When using a PaaS database product like Azure SQL Database, occasional transient errors occur more frequently than with a traditional on-premises server. The usual solution is to add retry logic.
Good sources on what errors are retryable are in here and here. If you would like a ready-made SqlConnection wrapper that implements retries, I can probably arrange to get you the one my client uses — I wrote it. DM me on twitter @sharpjs to discuss further.
Here is an example of one such error that came through my client's monitoring today:
System.Data.SqlClient.SqlException: Database '*****' on server '*****.database.windows.net' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of '*****'.
at System.Data.SqlClient.SqlInternalConnectionTds..ctor (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.SqlClient.SqlConnectionFactory.CreateConnection (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.ProviderBase.DbConnectionPool.CreateObject (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.ProviderBase.DbConnectionPool.UserCreateRequest (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.SqlClient.SqlConnection.TryOpenInner (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.SqlClient.SqlConnection.TryOpen (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at System.Data.SqlClient.SqlConnection.Open (System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
at ImageResizer.Plugins.SqlReader.SqlReaderPlugin.RowExists (ImageResizer.Plugins.SqlReader, Version=3.4.3.103, Culture=neutral, PublicKeyToken=null)
at ImageResizer.Configuration.PipelineConfig.GetFile (ImageResizer, Version=3.4.3.103, Culture=neutral, PublicKeyToken=null)
at ImageResizer.InterceptModule.CheckRequest_PostAuthorizeRequest (ImageResizer, Version=3.4.3.103, Culture=neutral, PublicKeyToken=null)
at System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Web.HttpApplication.ExecuteStepImpl (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Web.HttpApplication.ExecuteStep (System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
Are you sure that is desirable?
Retries can take down a server, and lengthen transient failures.
For example, imagine a server temporarily stops responding. Without retries, a few images fail to appear, but no additional load is placed on the server except end-users hitting refresh.
With retries, a backlog of retries is added to the request queue, which makes it more likely that the server will continue failing requests rather than make progress on shortening the queue and reducing the response time.
With subscription quotas, this problem gets even worse.
There's not that much fetch code: https://github.com/imazen/resizer/blob/develop/Plugins/SqlReader/SqlReader.cs#L145-L218
It seems like it would be trivial to implement. But I think I would suggest such a feature be off by default.
Yes, retry of transient errors is desirable and recommended for Azure SQL Database.
As this request specifically targets documented[1] transient errors, which typically resolve quickly, impact upon services need not be severe. Impact can be lessened further by exponential backoff, and even further by a cooperative retry policy — stop retrying when a heuristic detects that it isn't helping.
Off-by-default is a safe choice. Some configurability of the retry policy would be nice as well:
- Initial wait before retry
- Maximum wait before retry
- Maximum number of retries before responding with failure
- Maximum number of failure responses before abandoning retries
[1] See code links in original post, and also this MS article.
Okay. Keep in mind browsers have their own request timeouts.
I'd be open to a pull request for this if the logic is clear and has tests.
I might be able to put together a PR, though not quickly.