langchain DOC: Get only the sql query not the output of the query in SQLDatabaseChain

Issue with current documentation:

Hi, I need to get just the query only for a natural language input query. I don't want to get the results/output of the query from the db, how do I do that?

Thanks!

Idea or request for content:

No response

May 17 '23 12:05 gopesh97

I would also like to know

May 17 '23 12:05 Tajcore

I have this need as well. The patch is straightforward (see https://github.com/hwchase17/langchain/blob/master/langchain/chains/sql_database/base.py) and I'm happy to open a PR on this if one of the maintainers (@imeckr ? @hwchase17 ?) wants to comment on the best approach.

I'd propose setting a return_sql=True similar to return_direct=True, and more-or-less implementing how return_direct works.

(My use case is that I want to do some processing of the SQL query before execution.)

May 17 '23 21:05 richarddli

I am not maintainer but I find the feature request super useful. For what I can see I would recommend creating another chain SQLQueryGenChain and not introduce a new parameter, also use it inSQLDatabaseChain (composability). Though this approach would be different from the existing approach where depending upon parameters such as return_direct or use_query_checker llm chains are being called inside the _call method. And, if we would have a SQLQueryCheckerChain it allows independent use as will as inside SQLDatabaseChain as well.

What do you think @hwchase17 @dev2049 ? I could pick this up if you find it interesting.

May 18 '23 11:05 imeckr

Isn't it possible to actually duplicate the functionality of the sql generator with a normal LLMChain?

May 18 '23 14:05 Tajcore

Yes very much possible but it would be great to have it handy though.

May 19 '23 07:05 imeckr

hi, I was looking to add an additional function that could be passed to the SQLDatabaseChain. The idea being that this function is passed the generated sql and could sanity check it before it is executed. Currently the flat query_checker_prompt would send the resulting sql back to llm to verify it's correct. What I would like to do is to be able to check, using something like sqlparse, that the syntax is correct or that the query that is returned could be adjusted for additional filtering/checking.

Happy to submit a PR for this, but would like to have suggestions on how best this ought to be implemented.

Please see the change here:

https://github.com/lingster/langchain/commit/8909a0afdf03b2353e4239a3482f8efe56bf2bd6

May 20 '23 15:05 lingster

What I did instead was:

Goto the file langchain/chains/sql_database/base.py and set return_intermediate_steps to True

class SQLDatabaseChain(Chain):
    """Chain for interacting with SQL Database.

    Example:
        .. code-block:: python

            from langchain import SQLDatabaseChain, OpenAI, SQLDatabase
            db = SQLDatabase(...)
            db_chain = SQLDatabaseChain(llm=OpenAI(), database=db)
    """

    llm: BaseLanguageModel
    """LLM wrapper to use."""
    database: SQLDatabase = Field(exclude=True)
    """SQL Database to connect to."""
    prompt: BasePromptTemplate = PROMPT
    """Prompt to use to translate natural language to SQL."""
    top_k: int = 5
    """Number of results to return from the query"""
    input_key: str = "query"  #: :meta private:
    output_key: str = "result"  #: :meta private:
    return_intermediate_steps: bool = True
    """Whether or not to return the intermediate steps along with the final answer."""
    return_direct: bool = False
    """Whether or not to return the result of querying the SQL table directly."""

Goto the file langchain/chains/base.py and updated run function as:

    def run(self, *args: Any, **kwargs: Any) -> str:
        """Run the chain as text in, text out or multiple variables, text out."""
        # if len(self.output_keys) != 1:
        #     raise ValueError(
        #         f"`run` not supported when there is not exactly "
        #         f"one output key. Got {self.output_keys}."
        #     )

        if args and not kwargs:
            if len(args) != 1:
                raise ValueError("`run` supports only one positional argument.")
            
            return self(args[0])[self.output_keys[0]], self(args[0])[self.output_keys[1]]

        if kwargs and not args:
            return self(kwargs)[self.output_keys[0]]

        raise ValueError(
            f"`run` supported with either positional arguments or keyword arguments"
            f" but not both. Got args: {args} and kwargs: {kwargs}."
        )

Hence, to make a request, I just run this:

db = SQLDatabase.from_uri(os.getenv('POSTGRES_URI'))
llm = OpenAI(model_name="text-davinci-003", temperature=0.0)
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)
output, sql_cmd = db_chain.run(str(query))

Hence, it returns both the results and the sql query.