langchain icon indicating copy to clipboard operation
langchain copied to clipboard

SQLDatabase and SQLDatabaseChain with AWS Athena

Open clebermarq opened this issue 2 years ago • 7 comments

System Info

langchain==0.0.216 langchainplus-sdk==0.0.17 python==3.10

I'm trying to connect SQLDatabaseChain to AWS Athena and getting the following error:

conString = f"awsathena+rest://{AWS_ACCESS_KEY_ID}:{AWS_SECRET_ACCESS_KEY}@athena.{AWS_REGION_ID}.amazonaws.com/{DATABASE}"

engine_args={
                's3_staging_dir': "s3://mybuckets3/",
                'work_group':'primary'
            }

db = SQLDatabase.from_uri(database_uri=conString, engine_args=engine_args)

TypeError                                 Traceback (most recent call last)
Cell In[14], line 2
      1 #db = SQLDatabase.from_uri(conString)
----> 2 db = SQLDatabase.from_uri(database_uri=conString, engine_args=engine_args)

File ~\.conda\envs\generativeai\lib\site-packages\langchain\sql_database.py:124, in SQLDatabase.from_uri(cls, database_uri, engine_args, **kwargs)
    122 """Construct a SQLAlchemy engine from URI."""
    123 _engine_args = engine_args or {}
--> 124 return cls(create_engine(database_uri, **_engine_args), **kwargs)

File <string>:2, in create_engine(url, **kwargs)

File ~\.conda\envs\generativeai\lib\site-packages\sqlalchemy\util\deprecations.py:281, in deprecated_params.<locals>.decorate.<locals>.warned(fn, *args, **kwargs)
    274     if m in kwargs:
    275         _warn_with_version(
    276             messages[m],
    277             versions[m],
    278             version_warnings[m],
    279             stacklevel=3,
    280         )
--> 281 return fn(*args, **kwargs)

File ~\.conda\envs\generativeai\lib\site-packages\sqlalchemy\engine\create.py:680, in create_engine(url, **kwargs)
    678 # all kwargs should be consumed
    679 if kwargs:
--> 680     raise TypeError(
    681         "Invalid argument(s) %s sent to create_engine(), "
    682         "using configuration %s/%s/%s.  Please check that the "
    683         "keyword arguments are appropriate for this combination "
    684         "of components."
    685         % (
    686             ",".join("'%s'" % k for k in kwargs),
    687             dialect.__class__.__name__,
    688             pool.__class__.__name__,
    689             engineclass.__name__,
    690         )
    691     )
    693 engine = engineclass(pool, dialect, u, **engine_args)
    695 if _initialize:

TypeError: Invalid argument(s) 's3_staging_dir','work_group' sent to create_engine(), using configuration AthenaRestDialect/QueuePool/Engine.  Please check that the keyword arguments are appropriate for this combination of components.

Who can help?

No response

Information

  • [ ] The official example notebooks/scripts
  • [X] My own modified scripts

Related Components

  • [X] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [ ] Vector Stores / Retrievers
  • [ ] Memory
  • [X] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [X] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

Above

Expected behavior

Langchain connected to aws athena

clebermarq avatar Jul 09 '23 13:07 clebermarq

Answer generated by a 🤖

Answer

I understand that you're trying to connect to AWS Athena using the SQLDatabaseChain in LangChain, but you're encountering an error. This issue arises because the create_engine function from SQLAlchemy, which LangChain's SQLDatabase.from_uri method uses, doesn't recognize the 's3_staging_dir' and 'work_group' arguments. These arguments are specific to AWS Athena and aren't standard SQLAlchemy arguments.

To address this, one potential solution is to create a custom class AthenaDatabase that inherits from SQLDatabase and overrides the from_uri method to handle these specific arguments. Here's how you can do it:

from langchain import SQLDatabase
from pyathena import connect

class AthenaDatabase(SQLDatabase):
    @classmethod
    def from_uri(cls, database_uri, s3_staging_dir, work_group, **kwargs):
        connection = connect(aws_access_key_id=AWS_ACCESS_KEY_ID,
                             aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                             s3_staging_dir=s3_staging_dir,
                             region_name=AWS_REGION_ID,
                             work_group=work_group)
        return cls(connection, **kwargs)

You can then use this new class to connect to AWS Athena like this:

db = AthenaDatabase.from_uri(database_uri=conString, s3_staging_dir="s3://mybuckets3/", work_group='primary')

Please ensure that you have the pyathena package installed. If not, you can install it using pip:

pip install PyAthena

This solution is specifically tailored for AWS Athena and may not work with other SQL dialects. If you need to connect to different types of databases, you might need to create a different custom class for each one.

If you find this solution helpful and believe it could benefit other users, I encourage you to make a pull request to update the LangChain documentation. This way, others who encounter the same issue can easily find this solution.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

dosubot[bot] avatar Jul 09 '23 14:07 dosubot[bot]

Hi, now I'm facing another issue:

from langchain import SQLDatabase
from pyathena import connect

import pyathena
print(pyathena.__version__)
3.0.3

class AthenaDatabase(SQLDatabase):
    @classmethod
    def from_uri(cls, database_uri, **kwargs):
        connection = connect(aws_access_key_id=AWS_ACCESS_KEY_ID,
                             aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                             s3_staging_dir=S3_STAGING_DIR,
                             region_name=AWS_REGION_ID,
                             work_group=WORK_GROUP)
        return cls(connection, **kwargs)

conString = f"awsathena+rest://{AWS_ACCESS_KEY_ID}:{AWS_SECRET_ACCESS_KEY}@athena.{AWS_REGION_ID}.amazonaws.com/{DATABASE}"

db = AthenaDatabase.from_uri(database_uri=conString)

---------------------------------------------------------------------------
NoInspectionAvailable                     Traceback (most recent call last)
Cell In[11], line 1
----> 1 db = AthenaDatabase.from_uri(database_uri=conString)

Cell In[4], line 9, in AthenaDatabase.from_uri(cls, database_uri, **kwargs)
      2 @classmethod
      3 def from_uri(cls, database_uri, **kwargs):
      4     connection = connect(aws_access_key_id=AWS_ACCESS_KEY_ID,
      5                          aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
      6                          s3_staging_dir=S3_STAGING_DIR,
      7                          region_name=AWS_REGION_ID,
      8                          work_group=WORK_GROUP)
----> 9     return cls(connection, **kwargs)

File ~\.conda\envs\generativeai\lib\site-packages\langchain\sql_database.py:60, in SQLDatabase.__init__(self, engine, schema, metadata, ignore_tables, include_tables, sample_rows_in_table_info, indexes_in_table_info, custom_table_info, view_support, max_string_length)
     57 if include_tables and ignore_tables:
     58     raise ValueError("Cannot specify both include_tables and ignore_tables")
---> 60 self._inspector = inspect(self._engine)
     62 # including view support by adding the views as well as tables to the all
     63 # tables list if view_support is True
     64 self._all_tables = set(
     65     self._inspector.get_table_names(schema=schema)
     66     + (self._inspector.get_view_names(schema=schema) if view_support else [])
     67 )

File ~\.conda\envs\generativeai\lib\site-packages\sqlalchemy\inspection.py:118, in inspect(subject, raiseerr)
    115     reg = ret = None
    117 if raiseerr and (reg is None or ret is None):
--> 118     raise exc.NoInspectionAvailable(
    119         "No inspection system is "
    120         "available for object of type %s" % type_
    121     )
    122 return ret

NoInspectionAvailable: No inspection system is available for object of type <class 'pyathena.connection.Connection'>

Thank you in advance.

clebermarq avatar Jul 10 '23 12:07 clebermarq

Answer generated by a 🤖

Answer

I understand that you're trying to connect to AWS Athena using the SQLDatabaseChain in LangChain and encountering issues. The first error you're seeing is because the create_engine function from SQLAlchemy, which is used in the from_uri method of the SQLDatabase class, does not recognize the arguments 's3_staging_dir' and 'work_group'. These arguments are specific to PyAthena, not SQLAlchemy.

In your second attempt, you're trying to use a PyAthena connection object with the SQLDatabase class. However, this class is designed to work with SQLAlchemy engine objects, not PyAthena connection objects, which is why you're getting the NoInspectionAvailable error.

Currently, the SQLDatabase class in LangChain does not directly support AWS Athena. One potential solution could be to extend the SQLDatabase class to work with AWS Athena, but this would require a deep understanding of both SQLAlchemy and PyAthena, and might involve significant changes to the SQLDatabase class.

Please note that this is just one potential solution and there might be other ways to interact with your AWS Athena database. If you believe this is a bug that could impact other users and you have a solution, we encourage you to make a pull request. If the issue persists, you might want to consider upgrading to the latest version of LangChain.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

dosubot[bot] avatar Jul 10 '23 17:07 dosubot[bot]

@clebermarq What version of SQLAlchemy are you using? It looks like PyAthena only supports <2.0.0, which would cause this error

devstein avatar Jul 10 '23 17:07 devstein

@clebermarq What version of SQLAlchemy are you using? It looks like PyAthena only supports <2.0.0, which would cause this error

Hi, I'm using this version:

SQLAlchemy==2.0.17

Gonna try to use some older version.

Thanks for the tip.

clebermarq avatar Jul 11 '23 17:07 clebermarq

@clebermarq What version of SQLAlchemy are you using? It looks like PyAthena only supports <2.0.0, which would cause this error

Hi, I'm using this version:

SQLAlchemy==2.0.17

Gonna try to use some older version.

Thanks for the tip.

Same error:

import sqlalchemy
print(sqlalchemy.__version__)
1.4.0

db = AthenaDatabase.from_uri(database_uri=conString)

---------------------------------------------------------------------------
NoInspectionAvailable                     Traceback (most recent call last)
Cell In[8], line 1
----> 1 db = AthenaDatabase.from_uri(database_uri=conString)

Cell In[3], line 9, in AthenaDatabase.from_uri(cls, database_uri, **kwargs)
      2 @classmethod
      3 def from_uri(cls, database_uri, **kwargs):
      4     connection = connect(aws_access_key_id=AWS_ACCESS_KEY_ID,
      5                          aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
      6                          s3_staging_dir=S3_STAGING_DIR,
      7                          region_name=AWS_REGION_ID,
      8                          work_group=WORK_GROUP)
----> 9     return cls(connection, **kwargs)

File ~\.conda\envs\generativeai\lib\site-packages\langchain\sql_database.py:60, in SQLDatabase.__init__(self, engine, schema, metadata, ignore_tables, include_tables, sample_rows_in_table_info, indexes_in_table_info, custom_table_info, view_support, max_string_length)
     57 if include_tables and ignore_tables:
     58     raise ValueError("Cannot specify both include_tables and ignore_tables")
---> 60 self._inspector = inspect(self._engine)
     62 # including view support by adding the views as well as tables to the all
     63 # tables list if view_support is True
     64 self._all_tables = set(
     65     self._inspector.get_table_names(schema=schema)
     66     + (self._inspector.get_view_names(schema=schema) if view_support else [])
     67 )

File ~\.conda\envs\generativeai\lib\site-packages\sqlalchemy\inspection.py:71, in inspect(subject, raiseerr)
     68     reg = ret = None
     70 if raiseerr and (reg is None or ret is None):
---> 71     raise exc.NoInspectionAvailable(
     72         "No inspection system is "
     73         "available for object of type %s" % type_
     74     )
     75 return ret

NoInspectionAvailable: No inspection system is available for object of type <class 'pyathena.connection.Connection'>


clebermarq avatar Jul 11 '23 17:07 clebermarq

Answer generated by a 🤖

Answer

I found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself.

Closed Issues

error while calling SQLDatabaseChain on AWS Athena

The suggested solution is: Update the library versions to the following:

  • langchain==0.0.166
  • sqlalchemy==1.4.47
  • PyAthena[SQLAlchemy]==2.25.2
  • Python 3.8

Open Issues

error when creating SQLDatabase agent with Amazon Athena

No discussion on issue


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

dosubot[bot] avatar Jul 11 '23 17:07 dosubot[bot]

@clebermarq - did you manage to resolve this in the end? I'm also trying to get an Athena connection working

mikethemerry avatar Aug 15 '23 15:08 mikethemerry

Hi @mikethemerry , so far nothing :(

clebermarq avatar Aug 16 '23 11:08 clebermarq

Hi, @clebermarq! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding of the issue, you encountered a TypeError when trying to connect SQLDatabaseChain to AWS Athena. You received a suggestion to create a custom class AthenaDatabase that inherits from SQLDatabase and overrides the from_uri method to handle the specific arguments for AWS Athena. However, you encountered another error related to the NoInspectionAvailable when using PyAthena connection objects with the SQLDatabase class.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation!

dosubot[bot] avatar Nov 15 '23 16:11 dosubot[bot]

seems to be the same status now. just checking if there is an update on this.

MrZoidberg avatar Nov 21 '23 18:11 MrZoidberg

Hi, @clebermarq,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. It looks like you encountered a TypeError when trying to connect SQLDatabaseChain to AWS Athena using langchain. You received suggestions to create a custom class AthenaDatabase and tried downgrading the SQLAlchemy version, but the issue persists. The community has provided related closed and open issues that might be helpful in resolving the problem.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or updates, feel free to reach out.

dosubot[bot] avatar Feb 20 '24 16:02 dosubot[bot]