dbt-sqlserver
dbt-sqlserver copied to clipboard
Retry model on connection errors
Hey Guys,
Is it possible to implement re-trying of queries if the query fails due to non-model related errors?
For example, sometimes Azure SQL fails with errors like this:
[2021-02-24 05:50:02,777] {dbt_rpc.py:83} INFO - Failed model `base_signups`: Database Error in model base_thrive_leads_signups (models/02_base/base_signups.sql)
('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x68 (104) (SQLExecDirectW)')
compiled SQL at target/run/my_project/models/02_base/base_signups.sql
This is particularly frustrating in daily, automated runs through airlfow. I have solved it for now by adding retry logic through airflow that re-runs only failed and skipped models, but perhaps it would be worthwhile solving this at the dbt-sqlserver level for everyone?
Thanks, kim
Interesting. I have never had issues with that. Sounds like something to put in dbt-core? @jtcohen6 ?
@mikaelene There is retry logic within the connection object for some adapters, notably dbt-bigquery. Here we define which errors are intermittent, unlikely to recur, and thereby "retryable". Here is where those errors are caught, handled, and retried, up to the number of retries (docs) the user has defined:
I don't know what TCP Provider: Error code 0x68 is, what would cause it, or why it comes up intermittently, but all of that would be good to sort out before trying to handle it.
i was looking for a 'timeout' to wait until a connection is established.
i was looking for a 'timeout' to wait until a connection is established.
@majidaldo can you tell me more about what you're looking for with a timeout?
i was looking for a 'timeout' to wait until a connection is established.
@majidaldo can you tell me more about what you're looking for with a
timeout?
On Azure SQL, you can have a db 'sleep'. It can take a few minutes to resume. In the meantime, connection attempts usually fail.
i was looking for a 'timeout' to wait until a connection is established.
@majidaldo can you tell me more about what you're looking for with a
timeout?On Azure SQL, you can have a db 'sleep'. It can take a few minutes to resume. In the meantime, connection attempts usually fail.
great point! this is something we've been thinking about w/ azure sql serverless as well. do you mind opening a separate issue for that scenario? I believe your ask is different than @infused-kim's.
i was looking for a 'timeout' to wait until a connection is established.
@majidaldo can you tell me more about what you're looking for with a
timeout?On Azure SQL, you can have a db 'sleep'. It can take a few minutes to resume. In the meantime, connection attempts usually fail.
great point! this is something we've been thinking about w/ azure sql serverless as well. do you mind opening a separate issue for that scenario? I believe your ask is different than @infused-kim's.
https://github.com/dbt-msft/dbt-sqlserver/issues/162
We've seen some intermittent database connection error with Redshift, something like
connection to server at "redshift cluster domain name", port 5439 failed: timeout expired
If there is way to configure retry for a specified times, that would be great!
@yduan-polo I think you want to bring up your error as an issue on the dbt-redshift adapter repo!
I think we've got one for just the thing: https://github.com/dbt-labs/dbt-redshift/issues/96
(though the fix would more likely come in https://github.com/dbt-labs/dbt-core/issues/5022)
We've been encountering regular transient network issues as well which would be solved by a retry. Are there any plans to support this as a profile parameter (like in Postgres / redshift) or not?
We've been encountering regular transient network issues as well which would be solved by a retry. Are there any plans to support this as a profile parameter (like in Postgres / redshift) or not?
we could! can you share the error messages you're seeing, as well as the SQL Server product that you're using (on-prem, Azure SQL, etc)
Hi This is azure sql.
Encountered an error while running operation: Database Error 123 ('28000', "[28000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Login failed for user ''. (18456) (SQLDriverConnect)")
Hi This is azure sql.
are you using serverless Azure SQL that auto-pauses? while frustrating, this is behavior & message we expect to see. Normally, ~30-60s later the database is resumed and the error message goes away.
Would you like for dbt to retry a connection every few seconds until the error is resumed. Also, if you do for example have the wrong password, would you like dbt to assume that the error is transient and keep trying every few seconds to re-connect?
not trying to be critical, i'm just narrowing down the use case
Hi
No we are not using the auto pause variant.
I think more something try x amount of times and then fail anyway in case of a non transient error. I believe the error code coming back from sql server should specify the kind of error that gets returned maybe that could narrow the cases were a retry would be even useful?
On 25 August 2022 at 20:14:22, Anders @.***) wrote:
Hi This is azure sql.
are you using serverless Azure SQL that auto-pauses? while frustrating, this is behavior & message we expect to see. Normally, ~30-60s later the database is resumed and the error message goes away.
Would you like for dbt to retry a connection every few seconds until the error is resumed. Also, if you do for example have the wrong password, would you like dbt to assume that the error is transient and keep trying every few seconds to re-connect?
not trying to be critical, i'm just narrowing down the use case
— Reply to this email directly, view it on GitHub https://github.com/dbt-msft/dbt-sqlserver/issues/119#issuecomment-1227607439, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYIREBEF2DV6RK33JNE4RLV26ZX5ANCNFSM4YD34OPQ . You are receiving this because you commented.Message ID: @.***>
Thomas +32 475 269 452 <+32%20475%20269%20452>
The dbt-sqlserver 1.2.0 release includes an option retries in the target configuration which should mitigate these kinds of issues. Let me know if that doesn't work for you so that I can reopen this issue.
For reference for others finding this issue.
Code: https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/adapters/sqlserver/sql_server_connection_manager.py
Inherited by synapse: https://github.com/microsoft/dbt-synapse/blob/master/dbt/adapters/synapse/synapse_connection_manager.py
Documented here: https://docs.getdbt.com/docs/core/connect-data-platform/mssql-setup
Does anyone have guidance on recommended number of retries? I was going to default to 3, but is there a tradeoff here?
Does it actually retry after the initial connection for you guys? https://github.com/dbt-msft/dbt-sqlserver/issues/507