langchain SQLDatabaseChain malformed queries

It looks like the result returned from the predict call to generate the query is returned surrounded by double quotes, so when passed to the db it's taken as a malformed query.

Example (using a postgres db):

llm = ChatOpenAI(temperature=0, model_name="gpt-4") # type: ignore
db = SQLDatabase.from_uri("<URI>")
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)
print(db_chain.run("give me any row"))

Result:

Traceback (most recent call last):
  File "main.py", line 26, in <module>
    print(db_chain.run("give me any row"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/langchain/chains/base.py", line 213, in run
    return self(args[0])[self.output_keys[0]]
           ^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/langchain/chains/base.py", line 116, in __call__
    raise e
  File "/opt/homebrew/lib/python3.11/site-packages/langchain/chains/base.py", line 113, in __call__
    outputs = self._call(inputs)
              ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/langchain/chains/sql_database/base.py", line 88, in _call
    result = self.database.run(sql_cmd)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/langchain/sql_database.py", line 176, in run
    cursor = connection.execute(text(command))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1380, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "/opt/homebrew/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near ""SELECT data FROM table LIMIT 5""
LINE 1: "SELECT data FROM table LIMIT 5"

Mar 27 '23 10:03 alvarosevilla95

Getting this issue too

Apr 01 '23 17:04 benlevyx

Tried this hack for now:

class MyChatOpenAI(ChatOpenAI):
    def _generate(self, *args, **kwargs):
        res = super()._generate(*args, **kwargs)
        res.generations[0].text = res.generations[0].text.replace("\"", "")
        return res

Apr 01 '23 17:04 benlevyx

I am experiencing the same issue. The hack above doesn't work for me.

Apr 04 '23 12:04 laingc

FWIW, adding an explicit request on the prompt not to do this seems to work pretty consistently. Probably best to handle it via resillient parsing though.

The SQL query should be outputted plainly, do not surround it in quotes or anything else.

The full edited prompt:

Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Unless the user specifies in his question a specific number of examples he wishes to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.

Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.

Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

The SQL query should be outputted plainly, do not surround it in quotes or anything else.

Only use the tables listed below.

{table_info}

Question: {input}

Apr 05 '23 10:04 alvarosevilla95

The prompt is giving this error:

"ValidationError: 1 validation error for PromptTemplate root Invalid prompt schema; check for mismatched or missing input parameters. 'top_k' (type=value_error)"

Any idea how can I resolve?

Apr 28 '23 10:04 raisa314

better to remove the quotes in the prompt instead of asking for quotes and the asking to remove them:

from createDatabase import db

_sqlite_prompt = """You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question. Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database. Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers. Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Use the following format:

Question: Question here SQLQuery: SQL Query to run SQLResult: Result of the SQLQuery Answer: Final answer here

Only use the following tables: {table_info}

Question: {input}"""

_sqlite_prompt = PromptTemplate( input_variables=["input", "table_info", "top_k"], template=_sqlite_prompt, )

sql_chain = SQLDatabaseChain.from_llm(llm=llm, db=db, verbose=True, prompt=_sqlite_prompt)

May 04 '23 19:05 hansvdam

Removing the quotes from the prompt worked for me. Here is the code with some small fixes that worked.

SQLITE_PROMPT_TEXT = '''You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here

Only use the following tables:
{table_info}

Question: {input}'''

SQLITE_PROMPT = PromptTemplate(input_variables=['input', 'table_info', 'top_k'], template=SQLITE_PROMPT_TEXT)
sql_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True, prompt=SQLITE_PROMPT, return_intermediate_steps=True)

sql_chain('How many movies are there?')

May 10 '23 00:05 asimjalis

It's still querying the wrong column names even tho I put, "Pay attention to use only the column names you can see in the tables below."

Jul 25 '23 01:07 linhmx

@linhmx I am also facing same issue. Is there any way we can handle the error when wrong query generated in language model. We do not want to use agent from langchain.

Aug 09 '23 09:08 msc1987

Hi, @alvarosevilla95! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue is that the result returned from the predict call is surrounded by double quotes, causing it to be interpreted as a malformed query when passed to the database. One user suggested a hack to remove the quotes, but it didn't work for everyone. Another user suggested adding an explicit request in the prompt to output the SQL query plainly, which seemed to work for some users. It's great to see that there have been discussions about resolving other errors related to the prompt and handling wrong queries generated by the language model.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain project!

Nov 08 '23 16:11 dosubot[bot]

@msc1987 @linhmx did u get any method to resolve issue of querying the wrong column name?

Jan 24 '24 10:01 jyoti194