langchain icon indicating copy to clipboard operation
langchain copied to clipboard

It would be nice to make the SQL helper consider the table and column comments

Open kittenkill opened this issue 1 year ago • 5 comments

Feature request

There are some tables out there with comments. It would be nice if the sql agent could read them.

Motivation

Sometimes, column names do not describe perfectly well what they contain. If sql agent can consider the table and column comments, it will be able to better respond to queries.

Your contribution

Maybe something like this could work for the table comments:

--- sql_database.py.orig        2023-05-23 20:34:09.877909913 -0400
+++ sql_database.py     2023-05-23 20:34:13.857925528 -0400
@@ -268,11 +268,14 @@
             # add create table command
             create_table = str(CreateTable(table).compile(self._engine))
             table_info = f"{create_table.rstrip()}"
+            table_comment = table.comment
             has_extra_info = (
-                self._indexes_in_table_info or self._sample_rows_in_table_info
+                self._indexes_in_table_info or self._sample_rows_in_table_info or table_comment
             )
             if has_extra_info:
                 table_info += "\n\n/*"
+            if table_comment:
+                table_info += f"\nTable comment: {table_comment}\n"
             if self._indexes_in_table_info:
                 table_info += f"\n{self._get_table_indexes(table)}\n"
             if self._sample_rows_in_table_info:

kittenkill avatar May 24 '23 00:05 kittenkill

Would be helfpul for us too.

shadyelgewily-slimstock avatar May 24 '23 07:05 shadyelgewily-slimstock

This would be useful for our project!

KonstantinSDev avatar May 24 '23 07:05 KonstantinSDev

I am deliberately adding table comments to my databases so that an AI can understand them better. This PR would be perfect for that. @kittenkill I'm also not clear whether foreign keys are extracted. If not, this would be useful too.

willsmithorg avatar Jun 04 '23 00:06 willsmithorg

@willsmithorg PK-FK relationships are included when the schema is being sent with the example of rows in the prompt, however, its worth mentioning that there is no comments about those relationships. As far as I know, there is no research no the best way to include column comments, so this needs some experimentation. It could be that adding the comments independently via the context is enough, indeed

KonstantinSDev avatar Jun 05 '23 13:06 KonstantinSDev

I +1 this request.

Scenario: Sql queries generated by langchain sqlagent are good to answer simple questions. But for complex scenarios, majority of queries end-up wrong. For example, the Sql agent used product_price column instead of payment_amount column to calculate sales revenue per product. Similar problem when joining tables with multi-level association (eg: movies, rentals, payment, inventory).

Temp fix: These problems were resolved by providing a few extra lines of details as context followed by the actual question (prompt engineering). Then it was able to return valid sql-queries and answers.

Question: Is there a better way to address this Sqlagent issue? If not, please add a mechanism to input schema glossary to train the model instead of prompt engineering.

Much appreciate your support!

sivavishnubramma avatar Jul 09 '23 08:07 sivavishnubramma

Hi, @kittenkill! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, this issue is a feature request to make the SQL helper consider table and column comments. It seems like this feature would be beneficial for users' projects and for better understanding by AI. There has been a discussion about including foreign keys and column comments, and one user even suggested providing a mechanism to input schema glossary to train the model instead of prompt engineering.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution and understanding!

dosubot[bot] avatar Oct 08 '23 16:10 dosubot[bot]

@dosu-beta No, we need to add our comments from tables and columns to request for improve creating sql request

Aden-Kurmanov avatar Oct 09 '23 08:10 Aden-Kurmanov

@baskaryan Could you please help @Aden-Kurmanov with this issue? They have indicated that it is still relevant and they need assistance in adding comments from tables and columns to improve creating SQL requests. Thank you!

dosubot[bot] avatar Oct 09 '23 08:10 dosubot[bot]

I'm planning to look into this issue, something I've been thinking about as well ....it is very useful to add additional comments from the table metadata, which will give a much better response from the llm. In addition to the table comments, the column level comments can be included in the prompt as well.

hagope avatar Dec 17 '23 16:12 hagope

This is still relevant for us, we have managed to improve slightly by creating a view of our database tables with very clear names, but to us allowing to add additional comments to the schema would help bring our application to production.

shadyelgewily-slimstock avatar Dec 18 '23 08:12 shadyelgewily-slimstock

@kittenkill it looks like this approach of setting the table comment will not work using sqlalchemy, it's only used to set the table comment when the SQL is compiled. sqlalchemy cannot fetch any of the column or table level comments, as far as I can tell from browsing the code.

I'm thinking the best approach here is to include another parameter additional_context_sql to allow the user to fetch metadata for any other relevant data... this does kind of defeat the purpose of avoiding SQL though :)

hagope avatar Dec 20 '23 19:12 hagope