langchain
langchain copied to clipboard
SQLDatabase and SQLDatabaseChain with AWS Athena
System Info
langchain==0.0.216 langchainplus-sdk==0.0.17 python==3.10
I'm trying to connect SQLDatabaseChain to AWS Athena and getting the following error:
conString = f"awsathena+rest://{AWS_ACCESS_KEY_ID}:{AWS_SECRET_ACCESS_KEY}@athena.{AWS_REGION_ID}.amazonaws.com/{DATABASE}"
engine_args={
's3_staging_dir': "s3://mybuckets3/",
'work_group':'primary'
}
db = SQLDatabase.from_uri(database_uri=conString, engine_args=engine_args)
TypeError Traceback (most recent call last)
Cell In[14], line 2
1 #db = SQLDatabase.from_uri(conString)
----> 2 db = SQLDatabase.from_uri(database_uri=conString, engine_args=engine_args)
File ~\.conda\envs\generativeai\lib\site-packages\langchain\sql_database.py:124, in SQLDatabase.from_uri(cls, database_uri, engine_args, **kwargs)
122 """Construct a SQLAlchemy engine from URI."""
123 _engine_args = engine_args or {}
--> 124 return cls(create_engine(database_uri, **_engine_args), **kwargs)
File <string>:2, in create_engine(url, **kwargs)
File ~\.conda\envs\generativeai\lib\site-packages\sqlalchemy\util\deprecations.py:281, in deprecated_params.<locals>.decorate.<locals>.warned(fn, *args, **kwargs)
274 if m in kwargs:
275 _warn_with_version(
276 messages[m],
277 versions[m],
278 version_warnings[m],
279 stacklevel=3,
280 )
--> 281 return fn(*args, **kwargs)
File ~\.conda\envs\generativeai\lib\site-packages\sqlalchemy\engine\create.py:680, in create_engine(url, **kwargs)
678 # all kwargs should be consumed
679 if kwargs:
--> 680 raise TypeError(
681 "Invalid argument(s) %s sent to create_engine(), "
682 "using configuration %s/%s/%s. Please check that the "
683 "keyword arguments are appropriate for this combination "
684 "of components."
685 % (
686 ",".join("'%s'" % k for k in kwargs),
687 dialect.__class__.__name__,
688 pool.__class__.__name__,
689 engineclass.__name__,
690 )
691 )
693 engine = engineclass(pool, dialect, u, **engine_args)
695 if _initialize:
TypeError: Invalid argument(s) 's3_staging_dir','work_group' sent to create_engine(), using configuration AthenaRestDialect/QueuePool/Engine. Please check that the keyword arguments are appropriate for this combination of components.
Who can help?
No response
Information
- [ ] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [X] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [X] Agents / Agent Executors
- [ ] Tools / Toolkits
- [X] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
Above
Expected behavior
Langchain connected to aws athena
Answer generated by a 🤖
Answer
I understand that you're trying to connect to AWS Athena using the SQLDatabaseChain in LangChain, but you're encountering an error. This issue arises because the create_engine function from SQLAlchemy, which LangChain's SQLDatabase.from_uri method uses, doesn't recognize the 's3_staging_dir' and 'work_group' arguments. These arguments are specific to AWS Athena and aren't standard SQLAlchemy arguments.
To address this, one potential solution is to create a custom class AthenaDatabase that inherits from SQLDatabase and overrides the from_uri method to handle these specific arguments. Here's how you can do it:
from langchain import SQLDatabase
from pyathena import connect
class AthenaDatabase(SQLDatabase):
@classmethod
def from_uri(cls, database_uri, s3_staging_dir, work_group, **kwargs):
connection = connect(aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
s3_staging_dir=s3_staging_dir,
region_name=AWS_REGION_ID,
work_group=work_group)
return cls(connection, **kwargs)
You can then use this new class to connect to AWS Athena like this:
db = AthenaDatabase.from_uri(database_uri=conString, s3_staging_dir="s3://mybuckets3/", work_group='primary')
Please ensure that you have the pyathena package installed. If not, you can install it using pip:
pip install PyAthena
This solution is specifically tailored for AWS Athena and may not work with other SQL dialects. If you need to connect to different types of databases, you might need to create a different custom class for each one.
If you find this solution helpful and believe it could benefit other users, I encourage you to make a pull request to update the LangChain documentation. This way, others who encounter the same issue can easily find this solution.
Sources
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
Hi, now I'm facing another issue:
from langchain import SQLDatabase
from pyathena import connect
import pyathena
print(pyathena.__version__)
3.0.3
class AthenaDatabase(SQLDatabase):
@classmethod
def from_uri(cls, database_uri, **kwargs):
connection = connect(aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
s3_staging_dir=S3_STAGING_DIR,
region_name=AWS_REGION_ID,
work_group=WORK_GROUP)
return cls(connection, **kwargs)
conString = f"awsathena+rest://{AWS_ACCESS_KEY_ID}:{AWS_SECRET_ACCESS_KEY}@athena.{AWS_REGION_ID}.amazonaws.com/{DATABASE}"
db = AthenaDatabase.from_uri(database_uri=conString)
---------------------------------------------------------------------------
NoInspectionAvailable Traceback (most recent call last)
Cell In[11], line 1
----> 1 db = AthenaDatabase.from_uri(database_uri=conString)
Cell In[4], line 9, in AthenaDatabase.from_uri(cls, database_uri, **kwargs)
2 @classmethod
3 def from_uri(cls, database_uri, **kwargs):
4 connection = connect(aws_access_key_id=AWS_ACCESS_KEY_ID,
5 aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
6 s3_staging_dir=S3_STAGING_DIR,
7 region_name=AWS_REGION_ID,
8 work_group=WORK_GROUP)
----> 9 return cls(connection, **kwargs)
File ~\.conda\envs\generativeai\lib\site-packages\langchain\sql_database.py:60, in SQLDatabase.__init__(self, engine, schema, metadata, ignore_tables, include_tables, sample_rows_in_table_info, indexes_in_table_info, custom_table_info, view_support, max_string_length)
57 if include_tables and ignore_tables:
58 raise ValueError("Cannot specify both include_tables and ignore_tables")
---> 60 self._inspector = inspect(self._engine)
62 # including view support by adding the views as well as tables to the all
63 # tables list if view_support is True
64 self._all_tables = set(
65 self._inspector.get_table_names(schema=schema)
66 + (self._inspector.get_view_names(schema=schema) if view_support else [])
67 )
File ~\.conda\envs\generativeai\lib\site-packages\sqlalchemy\inspection.py:118, in inspect(subject, raiseerr)
115 reg = ret = None
117 if raiseerr and (reg is None or ret is None):
--> 118 raise exc.NoInspectionAvailable(
119 "No inspection system is "
120 "available for object of type %s" % type_
121 )
122 return ret
NoInspectionAvailable: No inspection system is available for object of type <class 'pyathena.connection.Connection'>
Thank you in advance.
Answer generated by a 🤖
Answer
I understand that you're trying to connect to AWS Athena using the SQLDatabaseChain in LangChain and encountering issues. The first error you're seeing is because the create_engine function from SQLAlchemy, which is used in the from_uri method of the SQLDatabase class, does not recognize the arguments 's3_staging_dir' and 'work_group'. These arguments are specific to PyAthena, not SQLAlchemy.
In your second attempt, you're trying to use a PyAthena connection object with the SQLDatabase class. However, this class is designed to work with SQLAlchemy engine objects, not PyAthena connection objects, which is why you're getting the NoInspectionAvailable error.
Currently, the SQLDatabase class in LangChain does not directly support AWS Athena. One potential solution could be to extend the SQLDatabase class to work with AWS Athena, but this would require a deep understanding of both SQLAlchemy and PyAthena, and might involve significant changes to the SQLDatabase class.
Please note that this is just one potential solution and there might be other ways to interact with your AWS Athena database. If you believe this is a bug that could impact other users and you have a solution, we encourage you to make a pull request. If the issue persists, you might want to consider upgrading to the latest version of LangChain.
Sources
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
@clebermarq What version of SQLAlchemy are you using? It looks like PyAthena only supports <2.0.0, which would cause this error
@clebermarq What version of SQLAlchemy are you using? It looks like PyAthena only supports <2.0.0, which would cause this error
Hi, I'm using this version:
SQLAlchemy==2.0.17
Gonna try to use some older version.
Thanks for the tip.
@clebermarq What version of SQLAlchemy are you using? It looks like PyAthena only supports <2.0.0, which would cause this error
Hi, I'm using this version:
SQLAlchemy==2.0.17Gonna try to use some older version.
Thanks for the tip.
Same error:
import sqlalchemy
print(sqlalchemy.__version__)
1.4.0
db = AthenaDatabase.from_uri(database_uri=conString)
---------------------------------------------------------------------------
NoInspectionAvailable Traceback (most recent call last)
Cell In[8], line 1
----> 1 db = AthenaDatabase.from_uri(database_uri=conString)
Cell In[3], line 9, in AthenaDatabase.from_uri(cls, database_uri, **kwargs)
2 @classmethod
3 def from_uri(cls, database_uri, **kwargs):
4 connection = connect(aws_access_key_id=AWS_ACCESS_KEY_ID,
5 aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
6 s3_staging_dir=S3_STAGING_DIR,
7 region_name=AWS_REGION_ID,
8 work_group=WORK_GROUP)
----> 9 return cls(connection, **kwargs)
File ~\.conda\envs\generativeai\lib\site-packages\langchain\sql_database.py:60, in SQLDatabase.__init__(self, engine, schema, metadata, ignore_tables, include_tables, sample_rows_in_table_info, indexes_in_table_info, custom_table_info, view_support, max_string_length)
57 if include_tables and ignore_tables:
58 raise ValueError("Cannot specify both include_tables and ignore_tables")
---> 60 self._inspector = inspect(self._engine)
62 # including view support by adding the views as well as tables to the all
63 # tables list if view_support is True
64 self._all_tables = set(
65 self._inspector.get_table_names(schema=schema)
66 + (self._inspector.get_view_names(schema=schema) if view_support else [])
67 )
File ~\.conda\envs\generativeai\lib\site-packages\sqlalchemy\inspection.py:71, in inspect(subject, raiseerr)
68 reg = ret = None
70 if raiseerr and (reg is None or ret is None):
---> 71 raise exc.NoInspectionAvailable(
72 "No inspection system is "
73 "available for object of type %s" % type_
74 )
75 return ret
NoInspectionAvailable: No inspection system is available for object of type <class 'pyathena.connection.Connection'>
Answer generated by a 🤖
Answer
I found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself.
Closed Issues
error while calling SQLDatabaseChain on AWS Athena
The suggested solution is: Update the library versions to the following:
- langchain==0.0.166
- sqlalchemy==1.4.47
- PyAthena[SQLAlchemy]==2.25.2
- Python 3.8
Open Issues
error when creating SQLDatabase agent with Amazon Athena
No discussion on issue
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
@clebermarq - did you manage to resolve this in the end? I'm also trying to get an Athena connection working
Hi @mikethemerry , so far nothing :(
Hi, @clebermarq! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding of the issue, you encountered a TypeError when trying to connect SQLDatabaseChain to AWS Athena. You received a suggestion to create a custom class AthenaDatabase that inherits from SQLDatabase and overrides the from_uri method to handle the specific arguments for AWS Athena. However, you encountered another error related to the NoInspectionAvailable when using PyAthena connection objects with the SQLDatabase class.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and cooperation!
seems to be the same status now. just checking if there is an update on this.
Hi, @clebermarq,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. It looks like you encountered a TypeError when trying to connect SQLDatabaseChain to AWS Athena using langchain. You received suggestions to create a custom class AthenaDatabase and tried downgrading the SQLAlchemy version, but the issue persists. The community has provided related closed and open issues that might be helpful in resolving the problem.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and cooperation. If you have any further questions or updates, feel free to reach out.