amazon-redshift-python-driver icon indicating copy to clipboard operation
amazon-redshift-python-driver copied to clipboard

cannot connect to redshift from superset ui when using iam authentication

Open mdeshmu opened this issue 3 years ago • 1 comments

Driver version

Redshift Python Driver 2.0.908

Redshift version

1.0.37493

Client Operating System

Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.2.5

Python version

3.8.12

Problem description

I am trying to connect to redshift via redshift_connector using passwordless IAM Authentication. It works fine when i directly login to ecs container where superset is hosted and create/execute the cursor but When i try to create a database connection from super ui then I am getting ConnectionRefusedError.

  1. Expected behaviour: Should be able to connect and query to Redshift from Superset UI

  2. Actual behaviour: Not able to connect to Redshift from Superset UI but can connect from the container where Superset container is running.

  3. Error message/stack trace: ERROR: (redshift_connector.error.InterfaceError) ('communication error', ConnectionRefusedError(111, 'Connection refused')) (Background on this error at: http://sqlalche.me/e/13/rvf5)

  4. Any other details that can be helpful: SQL Alchemy URL: redshift+redshift_connector:///<databasename> ENGINE PARAMETERS:

{
  "connect_args":
  {
  "ssl":"True",
  "region":"us-east-1",
  "iam":"True",
  "cluster_identifier":"<clusteridentifier>",
  "db_user":"<databaseuser>"
  }
}

I can telnet over 5439 to the redshift from ecs container as well as my local system.

Python Driver trace logs

driver-cursor

Reproduction code

please see in the comment

mdeshmu avatar Aug 13 '22 18:08 mdeshmu

Stack trace of sqlalchemy-redshift + redshift_connector

root@ip-:/app# python3
Python 3.8.12 (default, Mar  2 2022, 04:56:27)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import redshift_connector
>>> import logging
>>> import sys
>>> import sqlalchemy as sa
>>> logger = logging.getLogger()
>>> logger.setLevel(logging.DEBUG)
>>> handler = logging.StreamHandler(sys.stdout)
>>> handler.setLevel(logging.DEBUG)
>>> formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
>>> handler.setFormatter(formatter)
>>> logger.addHandler(handler)
>>> engine = sa.create_engine('redshift+redshift_connector:///databasename', connect_args={'iam': 'True', 'ssl': 'True', 'database': 'databasename', 'db_user': 'dbuser', 'cluster_identifier': 'clusteridentifier'})                                                                             
>>> connection = engine.connect()
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - ===================================
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - User provided connection arguments
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - ===================================
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - {'access_key_id': None, 'allow_db_user_override': False, 'app_id': None, 'app_name': 'amazon_aws_redshift', 'application_name': 'sqlalchemy-redshift', 'auth_profile': None, 'auto_create': False, 'client_id': None, 'client_protocol_version': 2, 'client_secret': None, 'cluster_identifier': 'clusteridentifier', 'credentials_provider': None, 'database_metadata_current_db_only': True, 'db_groups': [], 'db_name': 'databasename', 'db_user': 'dbuser', 'duration': 900, 'endpoint_url': None, 'force_lowercase': False, 'host': '', 'iam': 'True', 'iam_disable_cache': False, 'idp_host': None, 'idp_response_timeout': 120, 'idp_tenant': None, 'idpPort': 443, 'listen_port': 7890, 'login_url': None, 'max_prepared_statements': 1000, 'partner_sp_id': None, 'password': '', 'port': 5439, 'preferred_role': None, 'principal': None, 'profile': None, 'region': None, 'replication': None, 'role_arn': None, 'role_session_name': None, 'secret_access_key': None, 'session_token': None, 'source_address': None, 'ssl': 'True', 'ssl_insecure': True, 'sslmode': 'verify-full', 'tcp_keepalive': True, 'timeout': None, 'unix_sock': None, 'user_name': '', 'web_identity_token': None, 'provider_name': None, 'scope': '', 'numeric_to_float': False, 'is_serverless': False, 'serverless_acct_id': None, 'serverless_work_group': None, 'group_federation': False, 'is_serverless_host': False, '_is_serverless': False}
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - ===================================
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - ===================================
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - Connection arguments following validation and IAM auth (if applicable)
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - ===================================
2022-08-14 12:04:28,609 - redshift_connector - DEBUG - {'access_key_id': None, 'allow_db_user_override': False, 'app_id': None, 'app_name': 'amazon_aws_redshift', 'application_name': 'sqlalchemy-redshift', 'auth_profile': None, 'auto_create': False, 'client_id': None, 'client_protocol_version': 2, 'client_secret': None, 'cluster_identifier': 'clusteridentifier', 'credentials_provider': None, 'database_metadata_current_db_only': True, 'db_groups': [], 'db_name': 'databasename', 'db_user': 'dbuser', 'duration': 900, 'endpoint_url': None, 'force_lowercase': False, 'host': '', 'iam': 'True', 'iam_disable_cache': False, 'idp_host': None, 'idp_response_timeout': 120, 'idp_tenant': None, 'idpPort': 443, 'listen_port': 7890, 'login_url': None, 'max_prepared_statements': 1000, 'partner_sp_id': None, 'password': '', 'port': 5439, 'preferred_role': None, 'principal': None, 'profile': None, 'region': None, 'replication': None, 'role_arn': None, 'role_session_name': None, 'secret_access_key': None, 'session_token': None, 'source_address': None, 'ssl': 'True', 'ssl_insecure': True, 'sslmode': '', 'tcp_keepalive': True, 'timeout': None, 'unix_sock': None, 'user_name': '', 'web_identity_token': None, 'provider_name': None, 'scope': '', 'numeric_to_float': False, 'is_serverless': False, 'serverless_acct_id': None, 'serverless_work_group': None, 'group_federation': False, 'is_serverless_host': False, '_is_serverless': False}
2022-08-14 12:04:28,610 - redshift_connector - DEBUG - ===================================
2022-08-14 12:04:28,610 - redshift_connector.cursor - DEBUG - Cursor.paramstyle=named
2022-08-14 12:04:28,611 - redshift_connector.core - DEBUG - ===================================
2022-08-14 12:04:28,611 - redshift_connector.core - DEBUG - Establishing a connection
2022-08-14 12:04:28,611 - redshift_connector.core - DEBUG - {'user': '', 'database': 'databasename', 'application_name': 'sqlalchemy-redshift', 'replication': None, 'client_protocol_version': '2', 'driver_version': 'Redshift Python Driver 2.0.908', 'os_version': 'Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.2.5'}
2022-08-14 12:04:28,612 - redshift_connector.core - DEBUG - ===================================
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/core.py", line 585, in __init__
    self._usock.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 140, in _do_get
    self._dec_overflow()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
    return self._create_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 508, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/__init__.py", line 340, in connect
    return Connection(
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/core.py", line 632, in __init__
    raise InterfaceError("communication error", e)
redshift_connector.error.InterfaceError: ('communication error', ConnectionRefusedError(111, 'Connection refused'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2263, in connect
    return self._connection_cls(self, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 104, in __init__
    else engine.raw_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2369, in raw_connection
    return self._wrap_pool_connect(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2339, in _wrap_pool_connect
    Connection._handle_dbapi_exception_noconnection(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1583, in _handle_dbapi_exception_noconnection
    util.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 140, in _do_get
    self._dec_overflow()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
    return self._create_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 508, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/__init__.py", line 340, in connect
    return Connection(
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/core.py", line 632, in __init__
    raise InterfaceError("communication error", e)
sqlalchemy.exc.InterfaceError: (redshift_connector.error.InterfaceError) ('communication error', ConnectionRefusedError(111, 'Connection refused'))
(Background on this error at: http://sqlalche.me/e/13/rvf5)

mdeshmu avatar Aug 14 '22 12:08 mdeshmu

Thanks for reporting issue. We will take a look and get back to you.

iggarish avatar Aug 15 '22 18:08 iggarish

thanks @iggarish for the reply. Please let me know once you find the root cause of the issue and resolution plan.

mdeshmu avatar Aug 17 '22 01:08 mdeshmu

@iggarish we are waiting for this issue to be resolved. Please let us know if there are any updates on this issue and any ETA to fix this. Thanks.

mdeshmu avatar Aug 22 '22 02:08 mdeshmu

Hi @mdeshmu ,

thank you for providing the log and stack trace. this information is very helpful in debugging this issue.

I notice that the log shows the 'host' is unset when attempting to establish a connection. Are you are manually removing the value of 'host' from the log? If this is not the case, please let me know and I can continue to investigate.

If a value for host is present, this connection refused message likely points to the Redshift cluster refusing the connection due to something like the client machine not having access to the Redshift cluster VPC. With this issue in mind, Superset and sqlalchemy do nothing but pass the connection string to the driver, which establishes the connection.

Also, you mentioned

It works fine when i directly login to ecs container where superset is hosted and create/execute the cursor but When i try to create a database connection from super ui then I am getting ConnectionRefusedError.

This seems like something to look into more. When using Super UI how are you providing AWS credentials to redshift_connector? How are they passed when logging in directly to the ECS container?

Brooke-white avatar Aug 22 '22 16:08 Brooke-white

I assume you're pulling from AWS default profile as that's the default way to get credentials given your connection parameters but want to confirm this. I tried to reproduce this (passwordless IAM auth using credentials from AWS default profile) using redshift_connector directly but could not find any issue.

Could the AWS default profile not be present when you're using the superset UI? Or maybe some different cluster is specified that you don't have access to? I'd recommend to confirm it is as intended.

Brooke-white avatar Aug 22 '22 16:08 Brooke-white

@Brooke-white Thank you so much for the detailed reply.

This issue is independent of Apache Superset.

We have a Fargate ECS task with IAM role having following permissions.

        {
            "Effect": "Allow",
            "Action": "redshift:GetClusterCredentials",
            "Resource": [
                "arn:aws:redshift:<region>:<account-number>:dbname:<redshift-clustername>/<redshift-databasename>",
                "arn:aws:redshift:<region>:<account-number>:dbuser:<redshift-clustername>/<redshift-databaseuser>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "redshift:DescribeClusters",
            "Resource": [
                "arn:aws:redshift:<region>:<account-number>:cluster:<redshift-clustername>"
            ]
        }

I am not specifying the host in connection. I am just specifying cluster_identifier. redshift_connector when used with iam=True gets the cluster host uri using redshift:DescribeClusters. Also it gets the temporary password for the user i sepcified using redshift:GetClusterCredentials call.

I login to ECS Task using ECS exec and run following commands:

conn = redshift_connector.connect(
     iam=True,
     region='us-east-1',
     database='<redshift-databasename>',
     db_user='<redshift-databaseuser>',
     cluster_identifier='<redshift-clustername>',
  )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("select * from svv_columns limit 1")
result: tuple = cursor.fetchall()
print(result)

Above commands work completely fine and I get the data in response. Below are the selective debug logs when above commands were run.

2022-08-23 02:04:38,091 - redshift_connector - DEBUG - ===================================
2022-08-23 02:04:38,091 - redshift_connector.idp_auth_helper - DEBUG - boto3 version: 1.24.55
2022-08-23 02:04:38,092 - redshift_connector.idp_auth_helper - DEBUG - botocore version: 1.27.55
2022-08-23 02:04:38,092 - redshift_connector.iam_helper - DEBUG - AWS Credentials provider will be used for authentication

2022-08-23 02:04:38,101 - redshift_connector.iam_helper - DEBUG - boto3.client(service_name=redshift) being used for IAM auth
2022-08-23 02:04:38,102 - botocore.utils - DEBUG - IMDS ENDPOINT: http://169.254.169.254/

2022-08-23 02:04:38,104 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 169.254.170.2:80

2022-08-23 02:04:38,138 - botocore.endpoint - DEBUG - Making request for OperationModel(name=DescribeClusters) with params:  {'url_path': '/', 'query_string': '', 'method': 'POST', 'headers': {'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': 'Boto3/1.24.55 Python/3.8.12 Linux/4.14.281-212.502.amzn2.x86_64 exec-env/AWS_ECS_FARGATE Botocore/1.27.55'}, 'body': {'Action': 'DescribeClusters', 'Version': '2012-12-01', 'ClusterIdentifier': '<redshift-clustername>'}, 'url': 'https://redshift.us-east-1.amazonaws.com/', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7fb6f7f4b370>, 'has_streaming_input': False, 'auth_type': None}}

2022-08-23 02:04:38,140 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): redshift.us-east-1.amazonaws.com:443
2022-08-23 02:04:38,386 - urllib3.connectionpool - DEBUG - https://redshift.us-east-1.amazonaws.com:443 "POST / HTTP/1.1" 200 6230
2022-08-23 02:04:38,386 - botocore.parsers - DEBUG - Response headers: {'x-amzn-RequestId': '79fe019c-150b-4901-970d-d0372c8c1a95', 'Content-Type': 'text/xml', 'Content-Length': '6230', 'vary': 'accept-encoding', 'Date': 'Tue, 23 Aug 2022 02:04:38 GMT'}
2022-08-23 02:04:38,386 - botocore.parsers - DEBUG - Response body: b'<DescribeClustersResponse xmlns="http://redshift.amazonaws.com/doc/2012-12-01/">\n  <DescribeClustersResult>\n    <Clusters>\n      <Cluster>\n        <ClusterIdentifier><redshift-clustername></ClusterIdentifier>

xxx some more xml xxxx

2022-08-23 02:04:38,391 - redshift_connector.iam_helper - DEBUG - Searching credential cache for temporary AWS credentials. Found: True Expiration: 2022-08-23 02:13:01.300000+00:00
2022-08-23 02:04:38,391 - redshift_connector.iam_helper - DEBUG - Using temporary aws credentials with expiration: 2022-08-23 02:13:01.300000+00:00
2022-08-23 02:04:38,391 - redshift_connector - DEBUG - ===================================
2022-08-23 02:04:38,392 - redshift_connector - DEBUG - Connection arguments following validation and IAM auth (if applicable)
2022-08-23 02:04:38,392 - redshift_connector - DEBUG - ===================================
2022-08-23 02:04:38,392 - redshift_connector - DEBUG - {'access_key_id': None, 'allow_db_user_override': False, 'app_id': None, 'app_name': 'amazon_aws_redshift', 'application_name': None, 'auth_profile': None, 'auto_create': False, 'client_id': None, 'client_protocol_version': 2, 'client_secret': None, 'cluster_identifier': '<redshift-clustername>', 'credentials_provider': None, 'database_metadata_current_db_only': True, 'db_groups': [], 'db_name': '<redshift-databasename>', 'db_user': '<redshift-databaseuser>', 'duration': 900, 'endpoint_url': None, 'force_lowercase': False, 'host': '<redshift-host-uri>', 'iam': True, 'iam_disable_cache': False, 'idp_host': None, 'idp_response_timeout': 120, 'idp_tenant': None, 'idpPort': 443, 'listen_port': 7890, 'login_url': None, 'max_prepared_statements': 1000, 'partner_sp_id': None, 'password': '***', 'port': 5439, 'preferred_role': None, 'principal': None, 'profile': None, 'region': 'us-east-1', 'replication': None, 'role_arn': None, 'role_session_name': None, 'secret_access_key': None, 'session_token': None, 'source_address': None, 'ssl': True, 'ssl_insecure': True, 'sslmode': 'verify-ca', 'tcp_keepalive': True, 'timeout': None, 'unix_sock': None, 'user_name': 'IAM:<redshift-databaseuser>', 'web_identity_token': None, 'provider_name': None, 'scope': '', 'numeric_to_float': False, 'is_serverless': False, 'serverless_acct_id': None, 'serverless_work_group': None, 'group_federation': False, 'is_serverless_host': False, '_is_serverless': False}
2022-08-23 02:04:38,392 - redshift_connector - DEBUG - ===================================
2022-08-23 02:04:38,392 - redshift_connector.cursor - DEBUG - Cursor.paramstyle=named
2022-08-23 02:04:38,398 - redshift_connector.core - DEBUG - ===================================
2022-08-23 02:04:38,398 - redshift_connector.core - DEBUG - Establishing a connection
2022-08-23 02:04:38,398 - redshift_connector.core - DEBUG - {'user': 'IAM:<redshift-databaseuser>', 'database': '<redshift-databasename>', 'application_name': '', 'replication': None, 'client_protocol_version': '2', 'driver_version': 'Redshift Python Driver 2.0.908', 'os_version': 'Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.2.5'}
2022-08-23 02:04:38,398 - redshift_connector.core - DEBUG - ===================================
2022-08-23 02:04:38,437 - redshift_connector.cursor - DEBUG - Cursor.paramstyle=format
2022-08-23 02:04:38,437 - redshift_connector.core - DEBUG - Sending start-up message
2022-08-23 02:04:39,273 - redshift_connector.core - DEBUG - Server indicated EXTENDED_RESULT_METADATA transfer protocol will be used rather than protocol requested by client: BINARY

mdeshmu avatar Aug 23 '22 02:08 mdeshmu

But if use redshift_connector behind sqlalchemy-redshift then it doesn't work. Below are commands I am running directly from ECS task using ecs exec and not from Apache Superset UI (Hence no relation with Apache Superset).

import redshift_connector
import logging
import sys
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

engine = sa.create_engine('redshift+redshift_connector:///<redshift-databasename>', connect_args={'iam': 'True', 'region': 'us-east-1', 'db_user': '<redshift-databaseuser>', 'cluster_identifier': '<redshift-clustername>'})
                                                                           
connection = engine.connect()

2022-08-23 02:41:07,710 - redshift_connector - DEBUG - ===================================
2022-08-23 02:41:07,710 - redshift_connector - DEBUG - User provided connection arguments
2022-08-23 02:41:07,710 - redshift_connector - DEBUG - ===================================
2022-08-23 02:41:07,710 - redshift_connector - DEBUG - {'access_key_id': None, 'allow_db_user_override': False, 'app_id': None, 'app_name': 'amazon_aws_redshift', 'application_name': 'sqlalchemy-redshift', 'auth_profile': None, 'auto_create': False, 'client_id': None, 'client_protocol_version': 2, 'client_secret': None, 'cluster_identifier': '<redshift-clustername>', 'credentials_provider': None, 'database_metadata_current_db_only': True, 'db_groups': [], 'db_name': '<redshift-databasename>', 'db_user': '<redshift-databaseuser>', 'duration': 900, 'endpoint_url': None, 'force_lowercase': False, 'host': '', 'iam': 'True', 'iam_disable_cache': False, 'idp_host': None, 'idp_response_timeout': 120, 'idp_tenant': None, 'idpPort': 443, 'listen_port': 7890, 'login_url': None, 'max_prepared_statements': 1000, 'partner_sp_id': None, 'password': '', 'port': 5439, 'preferred_role': None, 'principal': None, 'profile': None, 'region': 'us-east-1', 'replication': None, 'role_arn': None, 'role_session_name': None, 'secret_access_key': None, 'session_token': None, 'source_address': None, 'ssl': True, 'ssl_insecure': True, 'sslmode': 'verify-full', 'tcp_keepalive': True, 'timeout': None, 'unix_sock': None, 'user_name': '', 'web_identity_token': None, 'provider_name': None, 'scope': '', 'numeric_to_float': False, 'is_serverless': False, 'serverless_acct_id': None, 'serverless_work_group': None, 'group_federation': False, 'is_serverless_host': False, '_is_serverless': False}
2022-08-23 02:41:07,710 - redshift_connector - DEBUG - ===================================
2022-08-23 02:41:07,711 - redshift_connector - DEBUG - ===================================
2022-08-23 02:41:07,711 - redshift_connector - DEBUG - Connection arguments following validation and IAM auth (if applicable)
2022-08-23 02:41:07,711 - redshift_connector - DEBUG - ===================================
2022-08-23 02:41:07,711 - redshift_connector - DEBUG - {'access_key_id': None, 'allow_db_user_override': False, 'app_id': None, 'app_name': 'amazon_aws_redshift', 'application_name': 'sqlalchemy-redshift', 'auth_profile': None, 'auto_create': False, 'client_id': None, 'client_protocol_version': 2, 'client_secret': None, 'cluster_identifier': '<redshift-clustername>', 'credentials_provider': None, 'database_metadata_current_db_only': True, 'db_groups': [], 'db_name': '<redshift-databasename>', 'db_user': '<redshift-databaseuser>', 'duration': 900, 'endpoint_url': None, 'force_lowercase': False, 'host': '', 'iam': 'True', 'iam_disable_cache': False, 'idp_host': None, 'idp_response_timeout': 120, 'idp_tenant': None, 'idpPort': 443, 'listen_port': 7890, 'login_url': None, 'max_prepared_statements': 1000, 'partner_sp_id': None, 'password': '', 'port': 5439, 'preferred_role': None, 'principal': None, 'profile': None, 'region': 'us-east-1', 'replication': None, 'role_arn': None, 'role_session_name': None, 'secret_access_key': None, 'session_token': None, 'source_address': None, 'ssl': True, 'ssl_insecure': True, 'sslmode': 'verify-full', 'tcp_keepalive': True, 'timeout': None, 'unix_sock': None, 'user_name': '', 'web_identity_token': None, 'provider_name': None, 'scope': '', 'numeric_to_float': False, 'is_serverless': False, 'serverless_acct_id': None, 'serverless_work_group': None, 'group_federation': False, 'is_serverless_host': False, '_is_serverless': False}
2022-08-23 02:41:07,711 - redshift_connector - DEBUG - ===================================
2022-08-23 02:41:07,711 - redshift_connector.cursor - DEBUG - Cursor.paramstyle=named
2022-08-23 02:41:07,711 - redshift_connector.core - DEBUG - ===================================
2022-08-23 02:41:07,711 - redshift_connector.core - DEBUG - Establishing a connection
2022-08-23 02:41:07,711 - redshift_connector.core - DEBUG - {'user': '', 'database': '<redshift-databasename>', 'application_name': 'sqlalchemy-redshift', 'replication': None, 'client_protocol_version': '2', 'driver_version': 'Redshift Python Driver 2.0.908', 'os_version': 'Linux-4.14.281-212.502.amzn2.x86_64-x86_64-with-glibc2.2.5'}
2022-08-23 02:41:07,712 - redshift_connector.core - DEBUG - ===================================
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/core.py", line 585, in __init__
    self._usock.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 140, in _do_get
    self._dec_overflow()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
    return self._create_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 508, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/__init__.py", line 340, in connect
    return Connection(
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/core.py", line 632, in __init__
    raise InterfaceError("communication error", e)
redshift_connector.error.InterfaceError: ('communication error', ConnectionRefusedError(111, 'Connection refused'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2263, in connect
    return self._connection_cls(self, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 104, in __init__
    else engine.raw_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2369, in raw_connection
    return self._wrap_pool_connect(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2339, in _wrap_pool_connect
    Connection._handle_dbapi_exception_noconnection(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1583, in _handle_dbapi_exception_noconnection
    util.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 140, in _do_get
    self._dec_overflow()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
    return self._create_connection()
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 508, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/__init__.py", line 340, in connect
    return Connection(
  File "/usr/local/lib/python3.8/site-packages/redshift_connector/core.py", line 632, in __init__
    raise InterfaceError("communication error", e)
sqlalchemy.exc.InterfaceError: (redshift_connector.error.InterfaceError) ('communication error', ConnectionRefusedError(111, 'Connection refused'))
(Background on this error at: http://sqlalche.me/e/13/rvf5)

I am passing the same parameters in both cases but in second case, redshift_connector doesn't take temporary credentials from task iam role, and hence doesn't make GetClusterCredentials call and so on. It i explicitly specify host then it asks for password.

I have replaced the real values with <redshift-clustername>, <redshift-databasename> and <redshift-databaseuser> wherever values were present for a key. If values were blank for a key, I haven't touched it.

Above has been tested with: SQLAlchemy==1.3.24 or SQLAlchemy==1.4.40 sqlalchemy-redshift==0.8.11 redshift-connector==2.0.908

mdeshmu avatar Aug 23 '22 02:08 mdeshmu

Hey @mdeshmu , thanks for the continued explanation. Reviewing the logs from the non-working example I see the IAM authentication workflow is not invoked, and there is no call to describeCluster being made, and therefore host is unset when we attempt to connect. This is likely why issue is seen here.

I believe this issue is happening due to how connect_args is defined

engine = sa.create_engine('redshift+redshift_connector:///<redshift-databasename>', connect_args={'iam': 'True', 'region': 'us-east-1', 'db_user': '<redshift-databaseuser>', 'cluster_identifier': '<redshift-clustername>'})

Could you try passing 'iam': True rather than 'iam': 'True'?

While not exactly inline with your use case, we have a tutorial on using sqlalchemy-redshift + redshift_connector for Identity provider authentication (which uses IAM authentication) here: https://aws.amazon.com/blogs/big-data/use-the-amazon-redshift-sqlalchemy-dialect-to-interact-with-amazon-redshift/

I hope this helps :)

Brooke-white avatar Aug 23 '22 16:08 Brooke-white

Thanks a lot @Brooke-white for pointing that out. I can connect now from Superset to Redshift using redshift_connector via IAM authentication.

mdeshmu avatar Aug 24 '22 01:08 mdeshmu