SQLDatabaseChain malformed queries
It looks like the result returned from the predict call to generate the query is returned surrounded by double quotes, so when passed to the db it's taken as a malformed query.
Example (using a postgres db):
llm = ChatOpenAI(temperature=0, model_name="gpt-4") # type: ignore
db = SQLDatabase.from_uri("<URI>")
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)
print(db_chain.run("give me any row"))
Result:
Traceback (most recent call last):
File "main.py", line 26, in <module>
print(db_chain.run("give me any row"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/chains/base.py", line 213, in run
return self(args[0])[self.output_keys[0]]
^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/chains/base.py", line 116, in __call__
raise e
File "/opt/homebrew/lib/python3.11/site-packages/langchain/chains/base.py", line 113, in __call__
outputs = self._call(inputs)
^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/chains/sql_database/base.py", line 88, in _call
result = self.database.run(sql_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/sql_database.py", line 176, in run
cursor = connection.execute(text(command))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1380, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
self._handle_dbapi_exception(
File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
util.raise_(
File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near ""SELECT data FROM table LIMIT 5""
LINE 1: "SELECT data FROM table LIMIT 5"
Getting this issue too
Tried this hack for now:
class MyChatOpenAI(ChatOpenAI):
def _generate(self, *args, **kwargs):
res = super()._generate(*args, **kwargs)
res.generations[0].text = res.generations[0].text.replace("\"", "")
return res
I am experiencing the same issue. The hack above doesn't work for me.
FWIW, adding an explicit request on the prompt not to do this seems to work pretty consistently. Probably best to handle it via resillient parsing though.
The SQL query should be outputted plainly, do not surround it in quotes or anything else.
The full edited prompt:
Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Unless the user specifies in his question a specific number of examples he wishes to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
The SQL query should be outputted plainly, do not surround it in quotes or anything else.
Only use the tables listed below.
{table_info}
Question: {input}
The prompt is giving this error:
"ValidationError: 1 validation error for PromptTemplate root Invalid prompt schema; check for mismatched or missing input parameters. 'top_k' (type=value_error)"
Any idea how can I resolve?
better to remove the quotes in the prompt instead of asking for quotes and the asking to remove them:
from createDatabase import db
_sqlite_prompt = """You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question. Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database. Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers. Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: Question here SQLQuery: SQL Query to run SQLResult: Result of the SQLQuery Answer: Final answer here
Only use the following tables: {table_info}
Question: {input}"""
_sqlite_prompt = PromptTemplate( input_variables=["input", "table_info", "top_k"], template=_sqlite_prompt, )
sql_chain = SQLDatabaseChain.from_llm(llm=llm, db=db, verbose=True, prompt=_sqlite_prompt)
Removing the quotes from the prompt worked for me. Here is the code with some small fixes that worked.
SQLITE_PROMPT_TEXT = '''You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here
Only use the following tables:
{table_info}
Question: {input}'''
SQLITE_PROMPT = PromptTemplate(input_variables=['input', 'table_info', 'top_k'], template=SQLITE_PROMPT_TEXT)
sql_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True, prompt=SQLITE_PROMPT, return_intermediate_steps=True)
sql_chain('How many movies are there?')
It's still querying the wrong column names even tho I put, "Pay attention to use only the column names you can see in the tables below."
@linhmx I am also facing same issue. Is there any way we can handle the error when wrong query generated in language model. We do not want to use agent from langchain.
Hi, @alvarosevilla95! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue is that the result returned from the predict call is surrounded by double quotes, causing it to be interpreted as a malformed query when passed to the database. One user suggested a hack to remove the quotes, but it didn't work for everyone. Another user suggested adding an explicit request in the prompt to output the SQL query plainly, which seemed to work for some users. It's great to see that there have been discussions about resolving other errors related to the prompt and handling wrong queries generated by the language model.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your contribution to the LangChain project!
@msc1987 @linhmx did u get any method to resolve issue of querying the wrong column name?