amazon-redshift-python-driver icon indicating copy to clipboard operation
amazon-redshift-python-driver copied to clipboard

[Bug] BrokenPipe: server socket closed error when we execute any dbt model in dbt core with redshift adapter

Open karthickGK4794 opened this issue 1 year ago • 11 comments

We are facing an issue if we execute a dbt model in dbt core with redshift adapter in multi nodes or multi threads. We did some analysis to find the reason for a failure by checking the dbt logs and we found that at some random point one of the threads was going to idle state for a certain period of time and it was not executing any tasks sent to it and eventually it got failed with the below statement. BrokenPipe: server socket closed. Please check that client side networking configurations such as Proxies, firewalls, VPN, etc. are not affecting your network connection. We suspect this could be because of introduction of redshift_connector in dbt-redshift adapter by replacing psycopg2. We weren't facing any such issues in dbt older versions which used psycopg2. Attaching the context from dbt log file

13:28:59.394786 [debug] [Thread-1 (]: Elementary: [dbt_tests] Artifacts changed.
.
. <LOT OF OTHER THREADS GO HERE USING ANOTHER CONNECTIONS>
.
13:53:01.835859 [debug] [Thread-1 (]: Elementary: Deleting from and inserting to: "elementary"."dbt_tests"
13:53:01.859812 [debug] [Thread-1 (]: Using redshift connection "model.elementary.dbt_tests"
13:53:01.860853 [debug] [Thread-1 (]: On model.elementary.dbt_tests: /* {"app": "dbt", "dbt_version": "1.8.9", "profile_name": "", "target_name": "", "node_id": "model.elementary.dbt_tests"} */
<SQL GOES HERE>
13:53:01.863164 [debug] [Thread-1 (]: Redshift adapter: Rolling back transaction.
13:53:01.864028 [debug] [Thread-1 (]: On model.elementary.dbt_tests: ROLLBACK
13:53:01.866800 [debug] [Thread-1 (]: Failed to rollback 'model.elementary.dbt_tests'
13:53:01.873315 [debug] [Thread-1 (]: On model.elementary.dbt_tests: Close
13:53:01.879586 [debug] [Thread-1 (]: Runtime Error in model dbt_tests (models/edr/dbt_artifacts/dbt_tests.sql)
BrokenPipe: server socket closed. Please check that client side networking configurations such as Proxies, firewalls, VPN, etc. are not affecting your network connection.
13:53:01.881652 [debug] [Thread-1 (]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '382ae431-f242-4924-a625-8b4dd918ff58', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb52f6a2a10>]}
13:53:01.882856 [error] [Thread-1 (]: 1 of 1 ERROR creating sql incremental model elementary.dbt_tests .......... [ERROR in 1646.81s]

As you can see last time when connection was used in Thread-1 was at 13:28:59.394786 (first line) After almost 25mins dbt got back to this connection to sent the next statement but the connection was no longer open There is no keep_alive option in redshift_connector - this is something we had specified in the old redshift adapter.

karthickGK4794 avatar Feb 06 '25 16:02 karthickGK4794

Hi, facing the same issue, Have you been able to resolve it?

efr-essakhan avatar Feb 11 '25 12:02 efr-essakhan

We are struggling with the same problem

kostek-pl avatar Feb 11 '25 13:02 kostek-pl

same problem here

joaobernardopa avatar Feb 11 '25 14:02 joaobernardopa

@karthickGK4794, there is an undocumented setting called tcp_keepalive that can be set to True. By default, it's set to None. Can you please test whether this addresses your issue?

bsharifi avatar Feb 25 '25 20:02 bsharifi

@karthickGK4794, there is an undocumented setting called tcp_keepalive that can be set to True. By default, it's set to None. Can you please test whether this addresses your issue?

I see this parameter has been deprecated after 1.5 in dbt however i tried to add it in the profiles.yml of my dbt project like below but it is not working

Image

Image

karthickGK4794 avatar Feb 26 '25 13:02 karthickGK4794

@karthickGK4794 dbt-redshift maintainer here, have you set retries in your profile? We recently made a change to retry these exceptions as they are usually transient

colin-rogers-dbt avatar Mar 04 '25 05:03 colin-rogers-dbt

@karthickGK4794 dbt-redshift maintainer here, have you set retries in your profile? We recently made a change to retry these exceptions as they are usually transient

Image

@colin-rogers-dbt - We have retry parameter set to 1 by default but we are not sure whether this retry is working as expected or not as we dont see any log or information related to this in dbt log. Could you let me know how this can be traced?

karthickGK4794 avatar Mar 04 '25 08:03 karthickGK4794

Same issue here, but we’re not using dbt-redshif. We also rely on redshift_connector for our Redshift connections, and we’re encountering the same “(redshift_connector.error.InterfaceError) BrokenPipe: server socket closed.” error. Specifically:

  • We tried enabling TCP keepalive, but that didn’t resolve the problem.

tatjana-lebesgue avatar Mar 04 '25 13:03 tatjana-lebesgue

Looking into this issue, the driver does enable tcp keepalives by default but does not set the interval. Therefore, depending upon the operating system configuration, the interval may be too long. There are some instruction here for configuring the operating system default keep alive intervals: https://docs.aws.amazon.com/redshift/latest/mgmt/troubleshooting-connections.html#connecting-firewall-guidance.change-tcpip-settings

bsharifi avatar Mar 04 '25 22:03 bsharifi

@bsharifi have you actually found that to work though? From our experience, reducing the tcp keepalive timings still has no effect - after a few minutes of a connection being idle, the socket is dead - it seems the tcp keepalive packets aren't enough to keep the socket open server side.

mwoods-familiaris avatar May 01 '25 09:05 mwoods-familiaris

Proposal to set the interval and other tcp settings per OS: https://github.com/aws/amazon-redshift-python-driver/pull/262

colin-rogers-dbt avatar May 05 '25 18:05 colin-rogers-dbt